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Foreword: 


(This foreword is not part of the standard) 
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oped with inputs from the TIA-APCO Project 25 Interface Committee (APIC), the APIC 
Vocoder Task Group, and TIA Industry Members. This Standard will be maintained by 
Working Group 8.4 of TR-8, under the sponsorship of TIA. 

This Recommended Standard describes the vocoder for land mobile radios meeting the 
Project 25 requirements. Publication of this Standard comes nearly three years after is¬ 
suance of an Interim Standard. This document has undergone rigorous public review, and 
implementations of this Vocoder have been created from this description. This document 
provides only the appropriate requirements to ensure vocoder compatibility for Project 25 
systems. 

For information on specific implementations, as they are developed, the reader is referred to 
the Project 25 System and Standard Definition Document originally published as TSB102 
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1 Scope 

This document specifies the voice coding method for the Project 25 System and Standard 
Definition IS 102 (originally published as a TSB). It describes the functional requirements 
for the transmission and reception of voice information using digital communication media 
described in the standard. This document is specifically intended to define the conversion 
of voice from an analog representation to a digital representation that consists of a net bit 
rate of 4.4 kbps for voice information, and a gross bit rate of 7.2 kbps after error control 
coding. This standard is compatible with the requirement for voice communication over 
the Project 25 Common Air Interface (TIA Document No. 102BAAA). 

The voice coder (or vocoder) presented in this document is intended to be used through¬ 
out Project 25 in any equipment that requires an analog-to-digital or digital-to-analog voice 
interface. Specifically, mobile and portable radios as well as console equipment and gate¬ 
ways to voice networks may contain the vocoder described in this document. The reader is 
referred to the Project 25 Shell Standard for additional information on the integration of 
the vocoder into the overall communication system. 

2 Introduction 

This document provides a functional description of the Improved Multi-Band Excitation 
(IMBE) voice coding algorithm adopted as the Project 25 vocoder standard. This docu¬ 
ment describes the essential operations that are necessary and sufficient to implement this 
voice coding algorithm. However, it is highly recommended that the references be studied 
prior to the implementation of this algorithm. It is also recommended that implementa¬ 
tions begin with a high-level language simulation of the algorithm, and then proceed to 
a real-time implementation using a digital signal processor. High performance real-time 
implementations have been demonstrated using both floating-point and fixed-point proces¬ 
sors. The reader is cautioned that this document does not attempt to describe the most 
efficient means of implementing the IMBE vocoder. The reader should consult one or more 
references on efficient real-time programming for more information on this subject. Addi¬ 
tionally this document does not address vocoder testing and verification. These subjects 
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Fig. 1: Improved Multi-Band Excitation Speech Coder 


will be addressed in separate documents that may be released at a later time. 

The IMBE speech coder is based on a robust speech model which is referred to as 
the Multi-Band Excitation (MBE) speech model [3]. The basic methodology of the coder 
is to divide a digital speech input signal into overlapping speech segments (or frames) 
using a window such as a Kaiser window. Each speech frame is then compared with the 
underlying speech model, and a set of model parameters are estimated for that particular 
frame. The encoder quantizes these model parameters and transmits a bit stream at 7.2 
kbps. The decoder receives this bit stream, reconstructs the model parameters, and uses 
these model parameters to generate a synthetic speech signal. This synthesized speech signal 
is the output of the IMBE speech coder as shown in Figure 1. One should note that the 
IMBE speech coder shown in this figure and defined by this document is a digital-to-digital 
function. 

The IMBE speech coder is a model-based speech coder, or vocoder, which does not try 
to reproduce the input speech signal on a sample by sample basis. Instead the IMBE speech 
coder constructs a synthetic speech signal that contains the same perceptual information as 
the original speech signal. Many previous vocoders (such as LPC vocoders, homomorphic 
vocoders, and channel vocoders) have not been successful in producing high quality synthetic 
speech. The IMBE speech coder has two primary advantages over these vocoders. First, 
the IMBE speech coder is based on the MBE speech model which is a more robust model 
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than the traditional speech models used in previous vocoders. Second, the IMBE speech 
coder uses more sophisticated algorithms to estimate the speech model parameters, and to 
synthesize the speech signal from these model parameters. 

This document is organized as follows. In Section 3 the MBE speech model is briefly 
reviewed. This section presents background material which is useful in understanding oper¬ 
ation of the IMBE speech coder. Section 4 describes the basic speech input/output require¬ 
ments. Section 5 examines the methods used to estimate the speech model parameters, and 
Section 6 examines the quantization and reconstruction of the MBE model parameters. The 
error correction and the format of the 7.2 kbps bit stream is discussed in Section 7. This 
is followed by Section 8 which describes the enhancement of the spectral amplitudes, and 
Section 9 which describes the adaptive smoothing method used to reduce the effect of un- 
correctable bit errors. Section 10 then demonstrates the encoding of a typical set of model 
parameters. Section 11 discusses the synthesis of speech from the MBE model parameters. 
A few additional comments on the algorithm and this document are provided in Section 12. 
Other information such as bit allocation tables, quantization levels and initialization vectors 
are contained in the attached appendices. In addition, Annex Annex K contains a set of 
flow charts describing certain elements of this vocoder. Note that these flow charts have 
been designed to help clarify the various algorithmic steps and do not necessarily describe 
the best or most efficient method of implementing the vocoder. 

3 Multi-Band Excitation Speech Model 

Let s(n) denote a discrete speech signal obtained by sampling an analog speech signal. In 
order to focus attention on a short segment of speech over which the model parameters are 
assumed to be constant, a window ic(n) is applied to the speech signal s(?i). The windowed 
speech signal s w (n) is defined by 

s w {n) = s{n)w(n) (1) 

The sequence s w (n) is referred to as a speech segment or a speech frame. The IMBE analysis 
algorithm actually uses two different windows, wr^ti) and rc/(n), each of which is applied 
separately to the speech signal via Equation (1). This will be explained in more detail in 


3 



TIA/EIA 102.BABA 


Section 5 of this document. The speech signal s(n) is shifted in time to select any desired 
segment. For notational convenience s w (n) refers to the current speech frame. The next 
speech frame is obtained by shifting s(n) by 20 ms. 

A speech segment s w (n) is modelled as the response of a linear filter h w (n) to some 
excitation signal e w (n). Therefore, S w (cu), the Fourier Transform of s w (?i), can be expressed 
as 

S w M = H w {u)E w {u) (2) 

where H w (uj) and E w (u) are the Fourier Transforms of h w (n ) and e w (n), respectively. 

In traditional speech models, speech is divided into two classes depending upon the na¬ 
ture of the excitation signal. For voiced speech the excitation signal is a periodic impulse 
sequence, where the distance between impulses is the pitch period Po- For unvoiced speech 
the excitation signal is a white noise sequence. One of the primary distinctions between tra¬ 
ditional vocoders is the method in which they model the linear filter h w (n). The frequency 
response of this filter is generally referred to as the spectral envelope of the speech signal. 
In a LPC vocoder, for example, the spectral envelope is modeled with a low order all-pole 
model. Similarly, in a homomorphic vocoder, the spectral envelope is modelled with a small 
number of cepstral coefficients. 

A primary difference between traditional speech models and the MBE speech model 
is the excitation signal. In conventional speech models a single voiced/unvoiced (V/UV) 
decision is used for each speech segment. In contrast the MBE speech model divides the 
excitation spectrum into a number of non-overlapping frequency bands and makes a V/UV 
decision for each frequency band. This allows the excitation signal for a particular speech 
segment to be a mixture of periodic (voiced) energy and noise-like (unvoiced) energy. This 
added degree of freedom in the modelling of the excitation allows the MBE speech model 
to generate higher quality speech than conventional speech models. In addition it allows 
the MBE speech model to be robust to the presence of background noise. 

In the MBE speech model the excitation spectrum is obtained from the pitch period (or 
the fundamental frequency) and the V/UY decisions. A periodic spectrum is used in the 
frequency bands declared voiced, while a random noise spectrum is used in the frequency 
bands declared unvoiced. The periodic spectrum is generated from a windowed periodic 
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Fig. 2: Comparison of Traditional and MBE Speech Models 


impulse train which is completely determined by the window and the pitch period. The 
random noise spectrum is generated from a windowed random noise sequence. 

A comparison of a traditional speech model and the MBE speech model is shown in 
Figure 2. In this example the traditional model has classified the speech segment as voiced, 
and consequently the traditional speech model is composed completely of periodic energy. 
The MBE model has divided the spectrum into 10 frequency bands in this example. The 
fourth, fifth, ninth and tenth bands have been declared unvoiced while the remaining bands 
have been declared voiced. The excitation in the MBE model is comprised of periodic energy 
only in the frequency bands declared voiced, while the remaining bands are comprised of 
noise-like energy. This example shows an important feature of the MBE speech model. 
Namely, the V/UV determination is performed such that frequency bands where the ratio 
of periodic energy to noise-like energy is high are declared voiced, while frequency bands 
where this ratio is low are declared unvoiced. The details of this procedure are discussed in 
Section 5.2. 
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Fig. 3: Analog Front End 


4 Speech Input/Output Requirements 

This section presents a number of performance recommendations for the analog front end of 
a voice codec, including the gain, filtering, and conversion elements as depicted in Figure 3. 
The objective is to establish a set of input/output requirements that will ensure that the 
voice codec operates at its maximum capability. The reader should note that Figure 3 shows 
four reference points (analog input, analog output, digital input and digital output) which 
are used in this document and will be used in future documents describing the test and 
verification procedure used with this vocoder. 

The voice encoder and decoder defined in the remainder of this document operates with 
unity (i.e. 0 dB) gain. Consequently the analog input and output gain elements shown in 
Figure 3 are only used to match the sensitivity of the microphone and speaker with the 
A-to-D converters and D-to-A converters, respectively. It is recommended that the analog 
input gain be set such that the RMS speech level under nominal input conditions is 25 dB 
below the saturation point of the A-to-D convertor. This level (-22 dBmO) is designed to 
provide sufficient margin to prevent the peaks of the speech waveform from being clipped 
by the A-to-D converter. 

The voice coder defined in this document requires the A-to-D and D-to-A converters 
to operate at an 8 kHz sampling rate (i.e. a sampling period of 125 microseconds) at the 
digital input/output reference points. This requirement necessitates the use of analog filters 


6 











TIA/EIA 102.BABA 



Fig. 4: Analog Input/Output Filter Mask 

at both the input and output to eliminate any frequency components above the Nyquist 
frequency (4 kHz). The recommendend input and output filter masks are shown in Figure 4. 
For proper operation, the frequency response of the analog filters should be bounded by the 
shaded zone depicted in this figure. 

This vocoder description assumes that the A-to-D converter produces digital speech 
which is confined to the range [-32768, 32767], and similarly that the D-to-A converter 
accepts digital speech within this same range. If a converter is used which does not meet 
these assumptions then the digital gain elements shown in Figure 1 should be adjusted 
appropriately. Note that these assumptions are automatically satisfied if 16 bit linear A-to- 
D and D-to-A converters are used, in which case the digital gain elements should be set to 
unity gain. Also note that the vocoder requires that any companding which is applied by the 
A-to-D converter (i.e. alaw or ulaw) should be removed prior to speech encoding. Similarly 
any companding used by the D-to-A converter must be applied after speech decoding. 
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Fig. 5: IMBE Speech Analysis Algorithm 


5 Speech Analysis 

This section presents the methods used to estimate the MBE speech model parameters. 
To develop a high quality vocoder it is essential that robust and accurate algorithms are 
used to estimate the model parameters. The approach which is presented here differs from 
conventional approaches in a fundamental way. Typically algorithms for the estimation of 
the excitation parameters and algorithms for the estimation of the spectral envelope pa¬ 
rameters operate independently. These parameters are usually estimated based on some 
reasonable but heuristic criterion without explicit consideration of how close the synthe¬ 
sized speech will be to the original speech. This can result in a synthetic spectrum quite 
different from the original spectrum. In the approach used in the IMBE speech coder the 
excitation and spectral envelope parameters are estimated simultaneously, so that the syn¬ 
thesized spectrum is closest in a least squares sense to the original speech spectrum. This 
approach can be viewed as an “analysis-by-synthesis” method. The theoretical derivation 
and justification of this approach is presented in references [3, 4, 6]. 

A block diagram of the analysis algorithm is shown in Figure 5. The MBE speech 
model parameters which must be estimated for each speech frame are the pitch period (or 
equivalently the fundamental frequency), the Y/UV decisions, and the spectral amplitudes 
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Fig. 6: High Pass Filter Frequency Response at 8 kHz Sampling Rate 


which characterize the spectral envelope. The organization of this section is as follows. 
First, the pitch estimation method is presented in Section 5.1. The V/UV determination 
is discussed in Section 5.2, and finally Section 5.3 discusses the estimation of the spectral 
amplitudes. 

The input to the speech analyzer, and, consequently, the encoder, is a discrete speech 
signal generated using an A-to-D converter as described in Section 4. This speech signal 
must first be digitally filtered to remove any residual energy at D.C. This is accomplished 
by passing the input signal through a discrete high-pass filter with the following transfer 
function: 


H(z) 


1-z- 1 
1 - .99z” 1 


(3) 


The resulting high-pass filtered signal is denoted by s(n) throughout the remainder of this 
section. Figure 6 shows the frequency response of the filter specified in equation (3) using 
the convention that the Nyquist frequency (4 kHz) is mapped to a discrete frequency of n 
radians. For more information on this frequency convention, which is used throughout this 
document, the reader is referred to reference [11]. 
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Fig. 7: Relationship between Speech Frames 


5.1 Pitch Estimation 


The objective in pitch estimation is to determine the pitch Pq corresponding to the “current” 
speech frame s w (7i). Pq is related to the fundamental frequency lo q by 



UJ 0 


( 4 ) 


where Po is measured in samples (at 8 kHz) and u>o is measured in radians. 

The pitch estimation algorithm attempts to preserve some continuity of the pitch be¬ 
tween neighboring speech frames. A pitch tracking algorithm considers the pitch from 
previous and future frames, when determining the pitch of the current frame. Previous and 
future speech frames are obtained by shifting the speech signal in 160 sample (20 ms) time 
increments prior to the application of the window in Equation (1). The pitches correspond¬ 
ing to the two future speech frames are denoted by Pi and P 2 . Similarly, the pitch of the 
two previous speech frames are denoted by P_i and P_ 2 . These relationships are shown in 
Figure 7. 

The pitch is estimated using a two-step procedure. First an initial pitch estimate, 
denoted by P/, is obtained. The initial pitch estimate is restricted to be a member of the 
set {21, 21.5, ... 121.5, 122}. It is then refined to obtain the final estimate of the fundamental 
frequency coo, which has one-quarter-sample accuracy. This two-part procedure is used in 
part to reduce the computational complexity, and in part to improve the robustness of the 
pitch estimate. 

One important feature of the pitch estimation algorithm is that the initial pitch estima- 
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Fig. 8: Window Alignment 


tion algorithm uses a different window than the pitch refinement algorithm. The window 
used for initial pitch estimation, is 301 samples long and is given in Annex An¬ 

nex B. The window used for pitch refinement (and also for spectral amplitude estimation 
and V/UV determination), is 221 samples long and is given in Annex Annex C. 

Throughout this document the window functions are assumed to be equal to zero outside 
the range given in the Annexes. The center point of the two windows must coincide, there¬ 
fore the first non-zero point of wn(n) must begin 40 samples after the first non-zero point of 
wi(n). This constraint is typically met by adopting the convention that tc/j(n) = wr(—7i) 
and wj{n) = wi(—n), as shown in Figure 8. The amount of overlap between neighboring 
speech segments is a function of the window length. Specifically the overlap is equal to the 
window length minus the distance between frames (160 samples). Therefore the overlap 
when using wr{ii) is equal to 61 samples and the overlap when using wj{n) is equal to 141 
samples. 
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5.1.1 Determination of E(P) 

To obtain the initial pitch estimate an error function, E(P), is evaluated for every P in 
the set {21, 21.5, ... 121.5, 122}. Pitch tracking is then used to compare the evaluations 
of E(P ), and the best candidate from this set is chosen as Pj. This procedure is shown in 
Figure 9. The function E(P) is defined by 

I U>0 I 

E{F) - EJJ-150 4pF(i)<"?(i)][i - P - 150 (5) 

where wj{n) is normalized to meet the constraint 

150 

E w iU) = 1-0 (6) 

i=-150 

This constraint is satisfied for wi(n) listed in Annex Annex B. The function r(t) is defined 
for integer values of t by 

150 

r(t) = E s LPF(j)w 2 i(j)s LPF {j+ t)w 2 i(j+ t) (7) 

J=—150 

The function r(t) is evaluated at non-integer values of t through linear interpolation: 

r{t) = (1 + L*J - t) ■ r([*J) + (*- L*J) ' r (W + !) (8) 

where is equal to the largest integer less than or equal to x (i.e. truncating values of 
x). The low-pass filtered speech signal is given by 

10 

*/ E .iVnrh ij) ( 9 ) 

./ io 

where hi,pp{n ) is the 21 point FIR filter given in Annex Annex D. 

The theoretical justification for the error function E(P) is presented in [3, 6]. The initial 
pitch estimate Pj is chosen such that E(Pi ) is small; however, Pj is not chosen simply to 
minimize E(P). Instead pitch tracking must be used to account for pitch continuity between 
neighboring speech frames. 

5.1.2 Pitch Tracking 

Pitch tracking is used to improve the pitch estimate by attempting to limit the pitch devia¬ 
tion between consecutive frames. If the pitch estimate is chosen to strictly minimize E(P ), 
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s(n)w!(n) 



Pi 


Fig. 9: Initial Pitch Estimation 


then the pitch estimate may change abruptly between succeeding frames. This abrupt 
change in the pitch can cause degradation in the synthesized speech. In addition, pitch 
typically changes slowly; therefore, the pitch estimates from neighboring frames can aid in 
estimating the pitch of the current frames. 

For each speech frame two different pitch estimates are computed. The first, Pg, is a 
backward estimate which maintains pitch continuity with previous speech frames. The sec¬ 
ond, Pp. is a forward estimate which maintains pitch continuity with future speech frames. 
The backward pitch estimate is calculated with the look-back pitch tracking algorithm, 
while the forward pitch estimate is calculated with the look-ahead pitch tracking algorithm. 
These two estimates are compared with a set of decision rules defined below, and either the 
backward or forward estimate is chosen as the initial pitch estimate, P/. 


5.1.3 Look-Back Pitch Tracking 

Let P_i and P _2 denote the initial pitch estimates which are calculated during the analysis 
of the previous two speech frames. Let P_i(P) and P_ 2 (P) denote the error functions of 
Equation (5) obtained from the analysis of these previous two frames as shown in Figure 7. 
Then P_ 1 (P- 1 ) and P_ 2 (P- 2 ) will have some specific values. Upon initialization the error 
functions P_i(P) and P_ 2 (P) are assumed to be equal to zero, and P_ 1 and P _2 are 
assumed to be equal to 100. 

Since pitch continuity with previous frames is desired, the pitch for the current speech 
frame is considered in a range near P_ 1 . First, the error function P(P) is evaluated at each 


13 







TIA/EIA 102.BABA 


value of P which satisfies constraints (10) and (11). 

,8-P_i < P < 1.2P_i (10) 

Pe{ 21,21.5, ...121.5,122} (11) 

These values of E(P) are compared and Pg is defined as the value of P which satisfies these 
constraints and which minimizes E(P). The backward cumulative error CEb{Pb ) is then 
computed using the following formula: 

CEb(Pb) = E(P b ) + P-i(P-i) + P_ 2 (P_ 2 ) (12) 

The backward cumulative error provides a confidence measure for the backward pitch esti¬ 
mate. It is compared against the forward cumulative error using a set of heuristics defined 
in Section 5.1.4. This comparison determines whether the forward pitch estimate or the 
backward pitch estimate is selected as the initial pitch estimate for the current frame. 

5.1.4 Look-Ahead Pitch Tracking 

Look-ahead tracking attempts to preserve pitch continuity between future speech frames. 
Let E\{P) and P 2 (P) denote the error functions of Equation (5) obtained from the two 
future speech frames as shown in Figure 7. Since the pitch has not been determined for 
these future frames, the look-ahead pitch tracking algorithm must select the pitch of these 
future frames. This is done in the following manner. First, Pq is assumed to be fixed. Then 
the Pi and P 2 are found which jointly minimize Pi (Pi) + P 2 (P 2 ), subject to constraints 
(13) through (16). 

Pi E {21,21.5, ...121.5,122} (13) 

•8 • P 0 < Pi < 1.2 • P 0 (14) 

P 2 E {21,21.5, ...121.5,122} (15) 

.8 • Pi < P 2 < 1.2 • Pi (16) 

The values of Pi and P 2 which jointly minimize E\ {P\ ) + P 2 (P 2 ) subject to these constraints 
are denoted by Pi and P 2 , respectively. Once P\ and P 2 have been computed the forward 
cumulative error function CE f (Pq) is computed according to: 

CE f (P 0 ) = E(P 0 ) + E 1 (P 1 ) + E2(P2) (17) 
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This process is repeated for each Pq in the set {21,21.5, ...121.5,122}. The corresponding 
values of CEf(Pq) are compared and Po is defined as the value of Po in this set which 
results in the minimum value of CEp(Po). Note that references [3, 6] should be consulted 
for more information on the theory and implementation of the look-ahead pitch tracking 
algorithm. 

Once Po has been found, the integer sub-multiples of Po (i-e. ■ must be 

considered. Every sub-multiple which is greater than or equal to 21 is computed and 
replaced with the closest member of the set {21, 21.5, ... 121.5, 122} (where closeness is 
measured with mean-square error). Sub-multiples which are less than 21 are disregarded. 

The smallest of these sub-multiples is checked against constraints (18), (19) and (20). 
If this sub-multiple satisfies any of these constraints then it is selected as the forward pitch 
estimate, Pp. Otherwise the next largest sub-multiple is checked against these constraints, 
and it is selected as the forward pitch estimate if it satisfies any of these constraints. This 
process continues until all pitch sub-multiples have been tested against these constraints. 
If no pitch sub-multiple satisfies any of these constraints then Pp = P> Note that this 
procedure will always select the smallest sub-multiple which satisfies any of these constraints 
as the forward pitch estimate. 


CEp{ —) < .85 
n 

and 

CEp(^) 

" < 1.7 

CEf(P 0 ) ~ 

(18) 

CE f ( — ) < A 
n 

and 

CEp(^) 

? <3.5 

CEp(P 0 ) ~ 

(19) 


CE F ( — ) < .05 
n 


( 20 ) 


Once the forward pitch estimate and the backward pitch estimate have both been com¬ 
puted the forward cumulative error and the backward cumulative error are compared. De¬ 
pending on the result of this comparison either Pp or Pp will be selected as the initial pitch 
estimate Pj. The following set of decision rules is used to select the initial pitch estimate 
from among these two candidates: 

If 

CEb{Pb) < -48, then Pj = Pg (21) 
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Fig. 10: Pitch Refinement 


Else if 

CE b ( P B ) < CE F (P F ), then P T = P B (22) 

Then 

Pi = Pf (23) 

The flow charts in Annex Annex K should be examined for more information on initial 
pitch estimation. Note that the initial pitch estimate, P/, is a member of the set {21, 21.5, 
... 121.5, 122}, and therefore it has half-sample accuracy. 

5.1.5 Pitch Refinement 

The pitch refinement algorithm improves the resolution of the pitch estimate from one half 
sample to one quarter sample. Ten candidate pitches are formed from the initial pitch 
estimate. These are P/ — |, P/ — ..., Pj + and Pj + |. These candidates are converted 

to their equivalent fundamental frequency using Equation (4). The error function E b (loo), 
defined in Equation (24), is evaluated for each candidate fundamental frequency uo. The 
candidate fundamental frequency which results in the minimum value of E b (lo o) is selected 
as the refined fundamental frequency u)q. A block diagram of this process is shown in 
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Figure 10 . 

LL^-sjf^oJ 

Er(uj 0 ) = \S w (m) - 5 w (m,w 0 )| 2 (24) 

m=50 


The synthetic spectrum S w (m,u. >o) is given by, 


A 0 {cv 0 )W R {64m) for |"a 0 ] < nn < [6 0 ] 

A\{u)q>)Wr{ [64m - + .5J) for fail < m < [foil 

S w {m,co 0 ) = < 


^( w o)JF K ([64m - + -5J) for \af\ < m < \b t ] 


(25) 


where a;, bi and Ai are defined in equations (26) thru (28), respectively. The notation [a;] 
denotes the smallest integer greater than or equal to x. 


256 (] n 
ai = —{l - .5)w 0 

ZTT 


Muo) 


bi = ——{l + .5)w 0 

Z7T 

S ^ m ) W R( L 64m - + -5]) 

£™gjl + .5J)| 2 


(26) 

(27) 

(28) 


The function S w (m) refers to the 256 point Discrete Fourier Transform of s(n) ■ w R (n), and 
VFft(m) refers to the 16384 point Discrete Fourier Transform of w R (n). These relationships 
are expressed below. Reference [11] should be consulted for more information on the DFT. 

110 

S w (m) = ^ s(n)wji(n)e~j—~ for —127 <m< 128 (29) 

n=—110 
110 

W R (m) = ^ w R (n)e~ M384 for —8191 < mn < 8192 (30) 

n=—no 

The notation W^(m) refers to the complex conjugate of W R (rn). However, since w R (n) is 
a real symmetric sequence, W^(m) = W R (m). 
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COt) 

S w (m) 



v k 

1 <k<K 




Fig. 11: IMBE Voiced/Unvoiced Determination 


Once the refined fundamental frequency has been selected from among the ten candi¬ 
dates, it is used to compute the number of harmonics in the current segment, L, according 
to the relationship: 

L = [.9254 * + -25JJ (31) 

LdO 

Due to the limits on Co o, equation (37) confines L to the range 9 < L < 56. Once this 
equation has been computed, the parameters ai and bi for 1 < l < L are computed from <2>o 
according to equations (32) and (33), respectively. 


, 256 

ai = ——(l - .5)wo 
Zn 


i 25( E, n* 

b{ — —— (l + .5)wo 

Zn 


(32) 

(33) 


5.2 Voiced/Unvoiced Determination 


The voiced/unvoiced (V/UV) decisions, Vk for 1 < k < K, are found by dividing the 
spectrum into K frequency bands and evaluating a voicing measure, Dk. for each band. 
The number of frequency bands is a function of L and is given by: 


K=l 


l(L+2) j l < 36 


(34) 


^ 12 otherwise 

The voicing measure for I < A: < K — 1 is given by 


D k 


“ S w(m,UJ o)/ 


\b 3k ]-l 

^m=\a 3k - 2 ] 


\S w {m)\' 


(35) 
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where Cj o is the refined fundamental frequency, and fy, S w (m ), and S w (m,u o) are defined 
in section 5.1.5. Similarly, the voicing measure for the highest frequency band is given by 


D k = 




|5 w (m) - S w (m, d)o)p 


[a 3it _ 2 l 




(36) 


The parameters D & for 1 < k < K are compared with a threshold function 0^(fc,o)o) given 
by: 


©^(A, c2; 0 ) = < 


0 

.5625 [1.0 - .3096(fc - l)w 0 ] • Af(C) 
.45 [1.0 - .3096(/c - l)w 0 ] • Af(C) 


if E{Pj) > .5 and k > 2 

else if £*,(—1) = 1 (37) 

otherwise 


where M(C) is an energy dependent function which is computed from a set of local energy 


parameters and Vk(~ 1) is the value of the 
uation of this threshold function requires 
for the current segment in the following 
evaluating equation (30) at m = 0. 

Clf = 

£hf = 

Co = 


k'th V/UV decision for the previous frame. Eval- 
the parameters C lf , £hf, and Co to be computed 
manner, where the value Wr{Q) is the found by 


|S«.(m)| 2 

(38) 

S |S„(m)| 2 

is. i"w>i 2 

(39) 

£lf + £hf 

(40) 


These parameters are then used to update the parameter £ max according to the rules pre¬ 
sented below. Throughout this section the notation Cmo.c(0) or ^ max is used to refer to the 
value of the parameter in the current frame, while the notation Cmaz( — 1) is used to refer 
to the value of the parameter in the previous frame. 



•5 Cmaai ( 

-l) + -5 Co 

if Co ^ (raw ( 1) 


^rv 

3 

a 

O 

II 

.99 <( \max 1 

(-1) + .0Uq 

else if .99 £ max {— 1) + -01 Co > 20000 

(41) 


20000 


otherwise 



The completed set of energy parameters for the current frame is used to calculate the 
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L<36, 3K 2<L<3K 



Fig. 12: IMBE Frequency Band Structure 


function M(£) as shown below . 



/ .0025 Uar+gO A 

\ -01£ma:r+£0 / 



1 



if £,lf > 5 £,hf 
otherwise 


(42) 


This function is then used in Equation (37) to calculate the V/UV threshold function. 
If Dk is less than the threshold function then the frequency band ask -2 < w < & 3 k is 
declared voiced; otherwise this frequency band is declared unvoiced. A block diagram of 
this procedure is shown in Figure 11. The adopted convention is that if the frequency band 
'(i 3 k -2 < co < b%k is declared voiced, then % = 1. Alternatively, if the frequency band 
& 3 k -2 < lo < b%k is declared unvoiced, then % = 0. 

With the exception of the highest frequency band, the width of each frequency band 
is equal to 34>o- Therefore all but the highest frequency band contain three harmonics of 
the refined fundamental frequency. The highest frequency band (as defined by Equation 
(34))may contain more or less than three harmonics of the fundamental frequency. If a 
particular frequency band is declared voiced, then all of the harmonics within that band 
are defined to be voiced harmonics. Similarly, if a particular frequency band is declared 
unvoiced, then all of the harmonics within that band are defined to be unvoiced harmonics. 
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l<k<K 


Fig. 13: IMBE Spectral Amplitude Estimation 


5.3 Estimation of the Spectral Amplitudes 


Once the V/UV decisions have been determined the spectral envelope can be estimated as 
shown in Figure 13. In the IMBE speech coder the spectral envelope in the frequency band 
ci 3 k -2 <u< b^k is specified by 3 spectral amplitudes, which are denoted by M 3 &- 2 5 M?,k -1 
and M^k- The relationship between the frequency bands and the spectral amplitudes is 
shown in Figure 12. If the frequency band a 3 &_ 2 < w < & 3 a- is declared voiced, then M 3 *._ 2 , 
Msk-h and M 3 & are estimated by, 


M t = 


Y' lAl-i 

m.— f a i 


\ u l\~ A - I Q ( 
m=\ai 1 I 0 ™* 




for l in the range 3k — 2 < l < 3k. Alternatively, if the frequency band a 3 /t_ 2 < a; < 6 3 /j is 
declared unvoiced, then M 3 *._ 2 , and M^k are estimated according to: 

(M - rail) 

for l in the range 3k — 2 < l < 3k. 

This procedure must be modified slightly for the highest frequency band which covers 
the frequency interval a 3 ^-_ 2 < u < b~ L . The spectral envelope in this frequency band 
is represented by L — 3 K + 3 spectral amplitudes, denoted M 3/ ^._ 2 , M 3/ v_ 1 , ..., M f . If 
this frequency band is declared voiced then these spectral amplitudes are estimated using 
equation (43) for 3 K — 2 < l < L. Alternatively, if this frequency band is declared unvoiced 
then these spectral amplitudes are estimated using equation (44) for 3 K — 2 < l < L. 


Mi = 


£ n=-no w r( u ) 
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As described above, the spectral amplitudes Mi are estimated in the range 
where L is given in Equation (31). Note that the lowest frequency band, a\ < 
specified by Mi, M 2 , and M 3 . The D.C. spectral amplitude, Mo, is ignored in 
speech coder and can be assumed to be zero. 


1 < l < L, 

l 0 < 63 , is 
the IMBE 
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Parameter 

Number of Bits 

Fundamental Frequency 
Voiced/Unvoiced Decisions 
Spectral Amplitudes 

S y nchr o niz at io n 

8 

k 

79 - K 

1 


Table 1: Bit Allocation Among Model Parameters 

6 Parameter Encoding and Decoding 

The analysis of each speech frame generates a set of model parameters consisting of the 
fundamental frequency, ujo, the V/UV decisions, for 1 < k < K, and the spectral 
amplitudes, Mi for 1 < l < L. Since the Project 25 speech coder is designed to operate at 
7.2 kbps with a 20 ms. frame length, 144 bits per frame are available for encoding the model 
parameters. Of these 144 bits, 56 are reserved for error control as is discussed in Section 7 of 
this document, and the remaining 88 bits are divided among the model parameters as shown 
in Table 1. This section describes the manner in which these bits are used to quantize, 
encode, decode and reconstruct the model parameters. In Section 6.1 the encoding and 
decoding of the fundamental frequency is discussed, while Section 6.2 discusses the encoding 
and decoding of the V/UV decisions. Section 6.3 discusses the quantization and encoding 
of the spectral amplitudes, and Section 6.4 discusses the decoding and reconstruction of the 
spectral amplitudes. Reference [7] provides general information on many of the techniques 
used in this section. 

6.1 Fundamental Frequency Encoding and Decoding 

The fundamental frequency is estimated with one-quarter sample resolution in the inter¬ 
val 12 3 7 j 25 < cDo < 19 2 g 75 ; however, it is only encoded at half-sample resolution. This is 
accomplished by finding the value of &o which satisfies: 

bo = - 39J (45) 
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value 

bits 

0 

0000 0000 

1 

0000 0001 

2 

0000 0010 

255 

1111 1111 


Table 2: Eight Bit Binary Representation 


The quantizer value bo is represented with 8 bits using the unsigned binary representation 
shown in Table 2. This representation is used throughout the IMBE speech coder to convert 
quantized values into a specific bit pattern. 

The fundamental frequency is decoded and reconstructed at the receiver by using Equa¬ 
tion (46) to convert bo to the received fundamental frequency cuo- In addition bo is used to 
calculate K and L. the number of V/UV decisions and the number of spectral amplitudes, 
respectively. These relationships are given in Equations (47) and (48). 


47T 


iOo — 


b 0 + 39.5 


7T 


L = L-9254L— + -25JJ 

UJ 0 


K = 


^(L+2) j if l < 36 


12 


otherwise 


(46) 

(47) 

(48) 


Since K and L control subsequent bit allocation by the receiver, it is important that 
they equal K and L, respectively. This occurs if there are no uncorrectable bit errors in 
the six most significant bits (MSB) of bo- For this reason these six bits are well protected 
by the error correction scheme discussed in Section 7. A block diagram of the fundamental 
frequency encoding and decoding process is shown in Figure 14. 

Since the pitch estimation algorithm described in Section 5.1 restricts the range of ljo 
to 123125 — < 19 2 g 75 , the value of bo computed according to Equation (45) is limited 

to the range 0 < bo < 207. The use of 8 bits to represent bo leaves 48 values of bo (i.e. 
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Fig. 14: Fundamental Frequency Encoding and Decoding 

208 < bo < 255) which are outside the valid range of pitch values. These 48 values are 
reserved for future use. 

6.2 Voiced/Unvoiced Decision Encoding and Decoding 

The Y/UV decisions for 1 < k < K, are binary values which classify each frequency 
band as either voiced or unvoiced. These values are encoded using 

= (49) 

k =l 

The quantizer value b\ is represented with K bits using an unsigned binary representation 
which is analogous to that shown in Table 2. 

At the receiver the K bits corresponding to 6i are decoded into the V/UV decisions vi for 
1 < l < L. Note that this is a departure from the Y/UY convention used by the encoder, 
which used a single V/UV decision to represent an entire frequency band. Instead the 
decoder uses a separate V/UV decision for each spectral amplitude. The decoder performs 
this conversion by using b\ to determine which frequency bands are voiced or unvoiced. The 
state of v\ is then set depending upon whether the frequency uj = l- lJq is within a voiced or 
unvoiced frequency band. This can be expressed mathematically as shown in the following 
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v k 


1<k<K 


V/UV Decision 
Encoding 


bi 



V/UV Decision 



Decoding 



bi 


v, 

1 <I<L 


Fig. 15: V/UV Decision Encoding and Decoding 


Gain Vector 



Fig. 16: Encoding of the Spectral Amplitudes 


two equations. 


Kl = 


J if l < 36 
12 otherwise 


(50) 


vi = 


bi 


2 k ~ k ‘ 


- 2 


bi 


2K+i-ki 


for 1 < l < L 


(51) 


Figure 15 shows a block diagram of the V/UV decision encoding and decoding process. 


6.3 Spectral Amplitudes Encoding 

The spectral amplitudes Mg for 1 < Z < L, are real values which must be quantized prior to 
encoding. This is accomplished as shown in Figure 16, by forming the spectral amplitude 
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prediction residuals T) for 1 < l < L, according to Equations (52) through (57). The reader 
is referred to [6] for more information on this topic. 

For the purpose of this discussion L(0) or L refer to the number of harmonics in the cur¬ 
rent frame, while L( — 1) refers to the number of harmonics in the previous frame. Similarly, 
M|(0) for 1 < l < L refers to the unquantized spectral amplitudes of the current frame, 
while Mi (—1) for 1 < l < L refers to the quantized spectral amplitudes of the previous 
frame. 


k 


Si 


H- 1 ) , 
m 

k - LfciJ 


(52) 

(53) 


Ti 


log 2 Mj(0) - p (1 - Si) log 2 (-1) 

- pS t log 2 M lkii+1 (-l) 

o i(0) . . 

+ log 2 M [hxi (-l) + S x log 2 M^ J+1 (-l)] (54) 

L yJ) x=i 


The prediction coefficient, p, is adjusted each frame according to the following rule: 


.4 


P = 


.03L(0) - .05 
.7 


if L(0) < 15 
if 15 < L(0) < 24 
otherwise 


(55) 


In order to form T\ using equations (52) through (55), the following assumptions are made: 


Afo(-l) = 1.0 (56) 

Mi(-l) = M i( _ 1} (-1) for l > L( — l) (57) 


Also upon initialization M;(—1) should be set equal to 1.0 for all /, and L(— 1) = 30. 

The L prediction residuals are then divided into 6 blocks. The length of each block, 
denoted J,; for 1 < i < 6, is adjusted such that the following constraints are satisfied: 

6 

= L (58) 

2 = 1 

L^rJ < Ji < Ji+ 1 < for 1 < i < 5 (59) 

6 6 
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L=34 


Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 

®l.j ^2,j C3 j C4 j Cg j Cg j 



Length: Ji - 5 J 2 - 5 J3 - 6 J4 - 6 J 5 - 6 J 6 - 6 


Low Frequency 


High Frequency 


Fig. 17: Prediction Residual Blocks for L = 34 


The table shown in Annex Annex .J lists the six block lengths for all possible values of 
L. The first or lowest frequency block is denoted by for 1 < j < Ji, and it consists 
of the first Ji consecutive elements of T} (i.e. 1 < l < J\). The second block is denoted 
by C 2 ,j for 1 < j < J2 , and it consists of the next J2 consecutive elements of T) (i.e. 
Ji + 1 < l < J\ + J 2 ). This continues through the sixth or highest frequency block, which 
is denoted by cqj for 1 < j < Jq. It consists of the last consecutive elements of % (i.e. 
L + 1 — Jq < l < L). An example of this process is shown in Figure 17 for L = 34. 

Each of the six blocks is transformed using a Discrete Cosine Transform (DCT), which 
is discussed in [7]. The length of the DCT for the i’th block is equal to Jj. The DCT 
coefficients are denoted by C t ^. where 1 < i < 6 refers to the block number, and 1 < k < J, 
refers to the particular coefficient within each block. The formula for the computation of 
these DCT coefficients is as follows: 


c,k = T E 


n(k - 1 )(j - i) 


Ji 


1 J= 1 


Ji 


for 1 < /,• < ./, 


(60) 


The DCT coefficients from each of the six blocks are then divided into two groups. The first 
group consists of the first DCT coefficient from each of the six blocks. These coefficients 
are used to form a six element vector, /?,, for 1 < % < 6 . where 77, = Cjj. The vector 11, is 
referred to as the gain vector, and its construction is shown in Figure 18. The quantization 
of the gain vector is discussed in section 6.3.1. 

The second group consists of the remaining higher order DCT coefficients. These coef- 
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Transformed 
Gain Vector 

»■ 






Fig. 18: Formation of Gain Vector 

ficients correspond to C,.j : where 1 < i < 0 and 2 < j < ./,. Note that if J t = 1, then there 
are no higher order DCT coefficients in the i’th block. The quantization of the higher order 
DCT coefficients is discussed in section 6.3.2. 

One important feature of the spectral amplitude encoding algorithm, is that the spectral 
amplitude information is transmitted differentially. Specifically, a prediction residual is 
transmitted which measures the change in the spectral envelope between the current frame 
and the previous frame. In order for a differential scheme of this type to work properly, 
the encoder must simulate the operation of the decoder and use the reconstructed spectral 
amplitudes from the previous frame to predict the spectral amplitudes of the current frame. 
The IMBE spectral amplitude encoder simulates the spectral amplitude decoder by setting 
L = L and then reconstructing the spectral amplitudes as discussed above. This is shown 
as the feedback path in Figure 16. 


Block 1 

J = 5 


J = 5 


Block 6 

J = 6 


Gain 

Vector 


•i-j 


Block 2 


C2,j 


L 6J 



6.3.1 Encoding the Gain Vector 


The gain vector can be viewed as a coarse represention of the spectral envelope of the 
current segment of speech. The quantization of the gain vector begins with a six point 
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DCT of II, for 1 < i < 6 as shown in the following equation. 

G m = i ^2 R{ cos[—--—] for 1 < m < 6 (61) 

i =1 

The resulting vector, denoted by G rn for 1 < m < 6 , is quantized in two parts. The first 
element, G\, can be viewed as representing the overall gain or level of the speech segment. 
This element is quantized using the 6 bit non-uniform quantizer given in Annex Annex E. 
The 6 bit value C is defined as the index of the quantizer value (as shown in Annex Annex 
E) which is nearest to G\. The remaining five elements of G rn are quantized using uniform 
scalar quantizers where the five quantizer values 63 to 67 are computed from the vector 
elements as shown in Equation (62). 


bm, — \ 


0 

2 R '" — 1 
Grn— 1 


if 

if L 


Grn— 1 


lAx^J+2 


Bm — 1 


A. 

otherwise 


J < —2 Bm 1 
J > 2 Bm ~ 1 


for 3 < rn < 7 (62) 


The parameters B m and A m in Equation (62) are the number of bits and the step sizes 
used to quantize each element. These values are dependent upon L, which is the number 
of harmonics in the current frame. This dependence is tabulated in Annex Annex F. Since 
L is known by the encoder, the correct values of B m and A m are first obtained using 
Annex Annex F and then the quantizer values b m for 3 < m < 7 are computed using 
Equation (62). The final step is to convert each quantizer value into an unsigned binary 
representation using the same method as shown in Table 2. 


6.3.2 Encoding the Higher Order DCT Coefficients 

Once the gain vector has been quantized, the remaining bits are used to encode the L — 6 
higher order DCT coefficients which complete the representation of the spectral amplitudes. 
Annex Annex G shows the bit allocation as a function of L for these coefficients. For each 
value of L the L — 6 entries, labeled b% through b j +] , provide the bit allocation for the 
higher order DCT coefficients. The adopted convention is that [ 6 g, 69 , ..., bj +] \ correspond 
to [Ci, 2 , Ci, 3 , • ■ C- Ui , ..., C 6 , 2 , Ce, 3 , • • •, C 6 jJ, respectively. 

Once the bit allocation for the higher order DCT coefficients has been obtained, these 
coefficients are quantized using uniform quantization. The step size used to quantize each 
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Number of Bits 

Step Size 

1 

1.2 a 

2 

.85(7 

3 

.65(7 

4 

.40(7 

5 

.28(7 

6 

.15(7 

7 

.08(7 

8 

.04(7 

9 

.02(7 

10 

.01(7 


Table 3: Uniform Quantizer Step Size for Higher Order DCT Coefficients 


coefficient must be computed from the bit allocation and the standard deviation of the 
DCT coefficient using Tables 3 and 4. For example, if 4 bits are allocated for a particular 
coefficient, then from Table 3 the step size, A, equals .40(7. If this was the the third DCT 
coefficient from any block (i.e. CQ 3 ), then a = .241 as shown in Table 4. Performing 
this multiplication gives a step size of .0964. Once the bit allocation and the step sizes 
for the higher order DCT coefficients have been determined, then the bit encodings b rn for 
8 < m < L + 1 are computed according to Equation (63). 


b 


m 


0 

< 2 Rm — 1 



if < _ 2 Rm 1 

if |_SAj > 2 r ™~ 1 

rn 

otherwise 


for 8 < m < L + 1 (63) 


The parameters b. m . B rn and A m in equation (63) refer to the quantizer value, the number 
of bits and the step size which has been computed for respectively. Note that the 
relationship between m, i, and k in Equation (63) is known and can be expressed as: 


i— 1 

m, = 6 + k + y, .J n 

n —1 


(64) 


Finally, each quantizer value is converted into the appropriate unsigned binary representa¬ 
tion which is analogous to that shown in Table 2. 
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DCT Coefficient 

<T 

a,2 

.307 

a,3 

.241 

c iA 

.207 

Ci, 5 

.190 

Ci,e 

.179 

Ci, 7 

.173 

Ci, 8 

.165 

Ci, 9 

.170 

Ci, lo 

.170 


Table 4: Standard Deviation of Higher Order DCT Coefficients 


Gain Vector 



M,(0) 

1<1<L 


Fig. 19: Decoding of the Spectral Amplitudes 


6.4 Spectral Amplitudes Decoding 

In order for the decoder to reconstruct the spectral amplitudes, the parameter L must first 
be computed from &o using Equations (46) and (47). Then the spectral amplitudes can be 
decoded and reconstructed by inverting the quantization and encoding procedure described 
above. A block diagram of the spectral amplitude decoder is shown in Figure 19. 

The first step in the spectral amplitude reconstruction process is to divide the spectral 
amplitudes into six blocks. The length of each block, J, for 1 < i < 6, is adjusted to meet 
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the following constraints. 


6 


= L 


L g J < ji < Ji +1 ^ f g 1 


for 1 < i < 5 


(65) 

( 66 ) 


The elements of these blocks are denoted by C r ^ ; . where 1 < i < 6 denotes the block number 
and where 1 < k < Ji denotes the element within that block. The first element of each 
block is then set equal to the decoded gain vector, Ri , via equation (67). The formation of 
the decoded gain vector is discussed in Section 6.4.1. 


C t .\ = R, for I < i < 6 


(67) 


The remaining elements of each block correspond to the decoded higher order DCT coeffi¬ 
cients which are discussed in Section 6.4.2. 


6.4.1 Decoding the Gain Vector 

The gain is decoded in two parts. First the six bit value 62 is used to decode the first element 
of the transformed gain vector, denoted by G\ . This is done by using the 6 bit value 62 
as an index into the quantizer values listed in Annex Annex E. Next the five quantizer 
values 63 through 67 are used to reconstruct the remaining five elements of the transformed 
gain vector, denoted by G 2 through Gq. This is done by using L, the number of harmonics 
in the current frame, in combination with the table in Annex Annex F to establish the 
bit allocation and step size for each of these five elements. The relationship between the 
quantizer values and the transformed gain vector elements is expressed in Equation (68), 

[ 0 if B m = 0 

G m - 1 = < _ _ - for 3 < m < 7 (68) 

I A m (b m — 2 Bm ~ l + .5) otherwise 

where A m and B rn are the step sizes and the number of bits found via Annex Annex F. 
Once the transformed gain vector has been reconstructed in this manner, the gain vector 1?,, 
for 1 < i < 6 must be computed through an inverse DCT of G m as shown in the following 
equations. 

Ri = ^ tt(m) G m cos [— --—] for 1 < i < 6 (69) 

777—1 
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a(m) 


1 if m = 1 

2 otherwise 


(70) 


6.4.2 Decoding the Higher Order DCT Coefficients 


The higher order DCT coefficients, which are denoted by (7,;^ for 2 < i < 6 and 1 < k < Jj, 
are reconstructed from the quantizer values b %, 69 , ..., 6 ^ +1 . First the bit allocation table 
listed in Annex Annex G is used to determine the appropriate bit allocation. The adopted 
convention is that [ 6 g, 69 , ..., 6 ^ +1 ] correspond to 7 j. 2 ., C\$, .. C ± j i , ..., , 

( 7 ( 3 , 3 , ..., (7 6 jJ, respectively. Once the bit allocation has been determined the step sizes 
for each are computed using Tables 3 and 4. The determination of the bit allocation 
and the step sizes proceeds in the same manner as is discussed in Section 6.3.2. Using the 
notation B rn and A m to denote the number of bits and the step size, respectively, then each 
higher order DCT coefficient can be reconstructed according to the following formula, 

I 0 if B m = 0 

Ci, k < . for 8 < m < L +1 (71) 

I A m (b m — 2 Bm ~ 1 + .5) otherwise 

where as in Equation (64), the following equation can be used to relate rn. i, and k. 

l—l 

m = 6 + k+^2 J n (72) 

n=l 

Once the DCT coefficients 6',^ have been reconstructed, an inverse DCT is computed 
on each of the six blocks to form the vectors (ijj. This is done using the following equations 
for 1 < * < 6. 

Ci,j = 2 ^ a ( k ) C i,k cos[ - ~ -—] for 1 < j < Ji (73) 

k =1 


| 1 iffc = l 
2 otherwise 


(74) 


The six transformed blocks Cij are then joined to form a single vector of length L, which 
is denoted T) for 1 < l < L. The vector 7) corresponds to the reconstructed spectral 
amplitude prediction residuals. The adopted convention is that the first J\ elements of T) 
are equal to Bij for 1 < j < J\. The next J 2 elements of 7) are equal to for 1 < j < J 2 . 
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This continues until the last Jq elements of 7} are equal to cqj for 1 < j < Jq. Finally, 
the reconstructed log 2 spectral amplitudes for the current frame are computed using the 
following equations. 


m 

(75) 

1 

II 

(76) 


log 2 M,(0) = T, + p (1 - 6 t ) log 2 M vk J (-1) 

+ p Si log 2 M^ J+1 (-l) 

D jw * . 

- T7^E[( 1 -^)log 2 M LKj (-l) + d ' A log 2 M LKj+1 (-l)] (77) 

L \ U I A=1 

In order to reconstruct M;(0) using equations (75) through (77), the following assumptions 
are always made: 

M 0 (-l) = 1.0 (78) 

Mi(-l) = M Z( _ 1} (-1) for l > L( — l) (79) 

In addition it is assumed that upon initialization M/(—1) = 1 for all l, and L(— 1) = 30. Note 
that later sections of the IMBE decoder require the spectral amplitudes, Mi for 1 < l < L, 
which must be computed by applying the inverse log 2 to each of the values computed with 
Equation (77). 

One final note is that it should be clear that the IMBE speech coder uses adaptive 
bit allocation and quantization which is dependent upon the number of harmonics in each 
frame. At the encoder the value L is used to determine the bit allocation and quantizer step 
sizes, while at the decoder the value L is used to determine the bit allocation and quantizer 
step sizes. In order to ensure proper operation it is necessary that these two values be equal 
(i.e. L = L). The encoder and decoder are designed to ensure this property except in the 
presence of a very large number of bit errors. In addition, the decoder is designed to detect 
frames where a large number of bit errors may prevent the generation of the correct bit 
allocation and quantizer step sizes. In this case the decoder discards the bits for the current 
frame and repeats the parameters from the previous frame. This is discussed in more detail 
in latter sections of this document. 
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6.5 Synchronization Encoding and Decoding 


A final one bit quantizer value is reserved in each speech frame for synchronization. This 
quantizer value, denoted by b / +2 is set to an alternating sequence by the encoder. If this 
bit was set to 0 during the previous speech frame, then this bit should be set to a 1 for the 
current speech frame. Otherwise, if this bit was set to 1 during the previous speech frame, 
then this bit should be set to a 0 for the current speech frame. This is expressed in the 
following equation, where 6 / +2 (0) refers to the value for the current frame, while 6^ +2 (—1) 
refers to the value for the previous frame. 


^L+ 2W — I 


0 if& i+2 (-l) = l 

1 otherwise 


(80) 


It is assumed that 6^ +2 (0) should be set equal to 0 during the first frame following initial¬ 
ization. 

The decoder may use this bit to establish synchronization. As presented in Section 7, 
this bit is not error correction encoded or modulated, and it is placed in a fixed offset relative 
to the beginning of each 144 bit frame of speech data. The decoder may check each possible 
offset in the received data stream and establish which offset is most likely to correspond to 
the synchronization bit. The beginning of each speech frame can then be established using 
the known distance between the beginning of each speech frame and the synchronization bit. 
Note that the number of received speech frames which is used to establish synchronization 
can be modified to trade off the probability of false synchronization, the synchronization 
delay, and the ability to acquire synchronization in the presence of bit errors. Also note 
that other synchronization fields may be provided outside the IMBE speech coder which 
may eliminate the need to use b- L+2 for synchronization. 
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t ... t 
to channel 


Fig. 20: Encoder Bit Manipulations 


7 Bit Manipulations 

The IMBE speech coder uses a number of different bit manipulations in order to increase its 
robustness to channel degradations. The quantizer values, bo, ..., b f+2 . are first prioritized 
into a set of bit vectors, denoted by ho, ..., 117 . These vectors are optionally encrypted, 
and then they are protected with error control codes, including both [23,12] Golay codes 
and [15,11] Hamming codes, to produce a set of code vectors denoted by z>o, ■ ■ ■, These 
code vectors are then modulated to produce a set of modulated code vectors denoted by 
co, ..., C7. Finally, intra-frame bit interleaving is used on the modulated code vectors in 
order to spread the effect of short burst errors. A block diagram of the bit manipulations 
performed by the encoder is shown in Figure 20. 

The IMBE decoder reverses the bit manipulations performed by the encoder. First the 
decoder de-interleaves each frame of 144 bits to obtain the eight modulated code vectors co, 
.. . , 07. The decoder then demodulates these vectors to produce the code vectors ho, ..., 
z >7 and then error control decodes these code vectors to produce the bit vectors fio, ..., u -. 
In order to ensure sufficient performance it is necessary that the decoder decode all error 
control codes up to their maximum error correction capability (i.e 3 errors for the Golay 
codes and 1 error for the Hamming codes). Optionally, soft-decision decoding can be used 
to further improve the robustness to bit errors. Note that at higher bit error rates (BER 
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Fig. 21: Decoder Bit Manipulations 


> 4%) soft-decision decoding can substantially improve performance. Next the decoder 
must decrypt the bit vectors (if encryption is employed at the encoder), and then it must 
rearrange the bit vectors to reconstruct the quantizer values, denoted by bo, b\, ..., b L+2 . 
These values are further decoded using the techniques described in Section 6 and finally 
used to synthesize the current frame of speech. A block diagram of the bit manipulations 
performed by the decoder is shown in Figure 21. 

One should note that the IMBE speech decoder employs a number of different mecha¬ 
nisms to improve performance in the presence of bit errors. These mechanisms consist first 
of error control codes, which are able to remove a significant number of errors. In addition, 
the IMBE speech coder uses bit modulation combined with frame repeats and frame mutes 
to detect and discard highly corrupted frames. Finally, the IMBE speech decoder uses adap¬ 
tive smoothing to reduce the perceived effect of any remaining errors. These mechanisms 
are all discussed in the following sections of this description. 


7.1 Bit Prioritization 

The first bit manipulation performed by the IMBE encoder is a rearrangement of the quan¬ 
tizer values bo, b\, ..., 6^ +2 into a set of 8 prioritized bit vectors denoted by uo, ui, ■ ■ ■, u?. 
The bit vectors uq through u% are 12 bits long, while the bit vectors hq through uq are 11 
bits long, and the bit vector u 7 is seven bits long. Throughout this section the convention 
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has been adopted that bit N — 1, where N is the vector length, is the MSB and bit 0 is the 
LSB. 

The prioritization of the quantizer values into the set of bit vectors begins with uq. The 
six most significant bits of uq (i.e. bits 11 through 6 ) are set equal to the six most significant 
bits of bo (i.e. bits 7 through 2). The next three most significant bits of uq (i.e. bits 5 
through 3) are set equal to the three most significant bits of 62 (i.e. bits 5 through 3). The 
remaining three bits of uq are generated from the spectral amplitude quantizer values 63 
through 6 / + | . Specifically, these quantizer values are arranged as shown in Figure 22. In 
this figure the shaded areas represent the number of bits which were allocated to each of 
these values assuming L = 16. Note that for other values of L this figure would change in 
accordance with the bit allocation information contained in Appendices Annex F and Annex 
G. The remaining three bits of uo are then selected by beginning in the upper left hand 
corner of this figure (i.e. bit 10 of 63 ) and scanning left to right. When the end of any 
row is reached the scanning proceeds from left to right on the next lower row. Bit 2 of uq 
is set equal to the bit corresponding to the first shaded block which is encountered using 
the prescribed scanning order. Similarly, bit 1 of Uo is set equal to the bit corresponding 
to the second shaded block which is encountered and bit 0 of Uo is set equal to the bit 
corresponding to the third shaded block which is encountered. 

The scanning of the spectral amplitude quantizer values 63 through 6 / +| which is used 
to generate the last three bits of uo is continued for the bit vectors u\ through 113 . Each 
successive bit in these vectors is set equal to the bit corresponding to the next shaded block. 
This process begins with bit 11 of fq, proceeds through bit 0 of ii\ followed by bit 11 of fq, 
and continues in this manner until finally reaching bit 0 of 113 . At this point the 48 highest 
priority bits have been assigned to the bit vectors iio through 113 as shown in Figure 23. 

The next bits to be inserted into the bit vectors are all of the bits of b\ (starting with 
the MSB), followed by bit 2 and then bit 1 of 62 , and then continuing with the scanning of 
63 through 6 / +] as described above. These bits are inserted into the bit vectors beginning 
with bit 10 of U 4 , proceeding through bit 0 of U 4 followed by bit 10 of ■%, and continuing in 
this manner until finally reaching bit 4 of 117 . The final four bits of A-, beginning with bit 
3 and ending with bit 0, are set equal to bit 0 of 62 , bit 1 of 6 q, bit 0 of bo, and then bit 0 
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MSB 10 

9 
8 
7 
6 
5 
4 
3 
2 
1 

LSB o 

b3 b<4 bs be b7 bg bg bio bn b^ bi3 bn b^ bn bn = b£+i 
Fig. 22 : Priority Scanning of 63 through 6 / +| 

of bj^ +2 , respectively. A block diagram of this procedure is shown in Figure 24 for K = 6 . 

7.2 Encryption 

This document treats optional encryption and decryption as transparent elements and does 
not attempt to define the actual encryption process. Flowever, Figures 20 and 21 depict an 
encryption and decryption element, respectively, in order to illustrate the proper placement 
of these elements in the IMBE vocoder. During encryption the bit vectors Ho, ..., 117 are 
combined bit-by-bit with an encryption sequence. This same sequence must be used at the 
decoder to recover the bit vectors Ho, ..., H7. In order to be interoperable the encryption 
and decrpytion processes must each use the same bit ordering. The standard ordering 
begins with the most significant bit (MSB) of 7*0 and continues in order of significance until 
the least significant bit (LSB) of uq is reached. This is followed in order by the bit vectors 
u\. 7*2, 7*3, 7*4, 7*5, 7*6 and 7*7, respectively, where each bit vector proceeds from MSB to 
LSB. The reader is referred to the Project 25 Common Air Interface and the Project 25 
Encryption Standard for more information on the encryption and decryption processes. 
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MSB LSB 


I 11 

6 11 5 3 11 2 0 | 

u 0 


[23,12] 

Vo 


bo 

b 2 63 ... b L+ i 



Golay Encode 









(after priority scanning) 





I 11 

°i 

Ul 


[23,12] 

Vi 



1 1 



Golay Encode 




D3 ... Dl+ 1 
(after priority scanning) 






I 11 

°i 

U2 


[23,12] 

V 2 



b 3 ... b L+l 



Golay Encode 









(after priority scanning) 

I 11 

°i 

U 3 


[23,12] 

v 3 



1 1^ 



Golay Encode 




t>3 ••• b +i 
(after priority scanning) 







Fig. 23: Formation of Code Vectors z>o through £3 


7.3 Error Control Coding 

At 7.2 kbps with a 20 ms frame size, 144 bits per frame are available for voice coding. The 
IMBE speech coder uses 88 of these bits to quantize the model parameters and provide 
synchronization, and the remaining 56 bits are used for forward error correction. The 56 
error control bits are divided between four [23,12] Golay codes and three [15,11] Hamming 
codes. The reader is referred to references [8, 9] for more information on the encoding and 
decoding of Golay and Hamming codes. 

The generation of the eight code vectors z)j for 0 < i < 7 is performed according to the 
following set of equations, 


k 

= Ui ■ g G 

for 0 < /' < 3 

(81) 

Vi 

= Ui ■ 9 h 

for 4 < i < 6 

(82) 

Z> 7 

= U7 


(83) 


where the g G and gu are the generator matrices for the [23,12] Golay code and the [15,11] 
Hamming code, respectively. These are shown below where absent entries are assumed to 
equal zero. Note that all operations are modulo 2, and the vectors i\ and Uj are assumed to 
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(after priority scanning) 



(after priority scanning) 



(after priority scanning) 



(after priority scanning) 

Fig. 24: Formation of Code Vectors £4 through £7 


be row vectors, where the “left” most bit is the MSB. This convention is used throughout 
this section. 


9G 


1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 


1 1 0 0 0 1 1 1 0 1 0 
0 1 1 0 0 0 1 1 1 0 1 
11110 110 10 0 
0 11110 110 10 
0 0 11110 110 1 
110 110 0 110 0 
0 110 110 0 110 
0 0 1 1 0 1 1 0 0 1 1 
1 1 0 1 1 1 0 0 0 1 1 
10 10 10 0 10 11 
1 0 0 1 0 0 1 1 1 1 1 
1 0 0 0 1 1 1 0 1 0 1 
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9h 


1 0 
0 1 0 
0 1 
0 


0 

1 0 
0 1 0 
0 1 
0 


0 

1 0 
0 1 0 
0 1 
0 


1 

1 

1 

1 

1 

1 

1 

0 

0 0 
1 0 0 
0 1 0 


1 1 1 
1 1 0 
1 0 1 
1 0 0 
0 1 1 
0 1 0 
0 0 1 
1 1 1 
1 1 0 
1 0 1 
0 1 1 


7.4 Bit Modulation 

The IMBE speech coder uses bit modulation keyed off the code vector z>o to provide a 
mechanism for detecting errors in z>o beyond the three errors that the [23,12] Golay code 
can correct. Note that the term bit modulation in the context of this document refers to 
the presented method for multiplying (or modulating) each frame of code vectors by a data 
dependent pseudo-random sequence. The first step in this procedure is to generate a set 
of binary modulation vectors which are added (modulo 2) to the code vectors z>o through 
z> 7 . The modulation vectors are generated from a pseudo-random sequence whose seed is 
derived from uq. Specifically, the sequence defined in the following equations is used, 


Pr( 0 ) = 16 u 0 ( 84 ) 

. , . , . 173 p r (n — 1) + 13849 

Pr(n) = 173p r n-l + 13849 - 65536 - ’ - (85) 

o553o 

where the bit vector uq is interpreted as an unsigned 12 bit number in the range [0, 4095]. 
Equation (85) is used to recursively compute the pseudo-random sequence, p r (n ), over the 
range 1 < n < 114. Each element of this sequence can be interpreted as a 16 bit random 
number which is uniformly distributed over the interval [0, 65535]. Using this interpretation, 
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a set of binary modulation vectors, denoted by mo through 777 , 7 , are generated from this 
sequence as shown below. 


m 0 

nil 

m2 
rh 3 
7774 

m 5 

m & 


[0, 0, ..., 

0] 


Pr (!) | 

. 32768 

1 Pr( 2) I 
L 32768 J ’"' 

1 TV(23) 

’ L 32768 J 

[,Pr(24) 

L 32768 J ' 

Pr (25) 

L 32768 ’ " 

1 Pr (46) 
■’ L 32768 

\ 1 Pr (47) | 

. L 32768 J ' 

1 Pr (48) 

L 32768 ’ " 

1 Pr (69) 
■’ L 32768 

'iPr(70) 
.32768 J ' 

,Pr(71) 

L 32768 ’ " 

1 Pr (84) 
■’ L 32768 

f,Pr(85) 
.32768 J ' 

. p r (86) 

L 32768 ’ " 

1 Pr (99) 
■’ L 32768 


1 Pr (100) , , Pr(101) , , Pr(H4) ' 

L 32768 L 32768 L 32768 J 


( 86 ) 

(87) 

( 88 ) 

(89) 

(90) 

(91) 

(92) 


m 7 = [ 0 , 0 , ..., 0 ] 


(93) 


Once these modulation vectors have been computed in this manner, the modulated code 
vectors, z>o for 0 < i < 7, are computed by adding (modulo 2) the code vectors to the 
modulation vectors. 


c* = i>i + rhi for 0 < i < 7 (94) 

One should note that the bit modulation performed by the IMBE encoder can be inverted 
by the decoder if co does not contain any uncorrectable bit errors. In this case Golay 
decoding co, which always equals Do since mo = 0, will yield the correct value of uo■ The 
decoder can then use iio to reconstruct the pseudo-random sequence and the modulation 
vectors rh\ through 777 , 7 . Subtracting these vectors from ci though c 7 will then yield the 
code vectors D\ though D-. At this point the remaining error control decoding can be 
performed. In the other case, where co contains uncorrectable bit errors, the modulation 
cannot generally be inverted by the decoder. In this case the likely result of Golay decoding 
co will be some iio which does not equal iio- Consequently the decoder will initialize the 
pseudo-random sequence incorrectly, and the modulation vectors computed by the decoder 
will be uncorrelated with the modulation vectors used by the encoder. Using these incorrect 
modulation vectors to reconstruct the code vectors is essentially the same as passing iq, ..., 
Do through a 50 percent bit error rate (BER) channel. The IMBE decoder exploits the fact 
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that, statistically, a 50 percent BER causes the Golay and Hamming codes employed on u\ 
through z>6 to correct a number of errors which is near the maximum capability of the code. 
By counting the total number of errors which are corrected in all of these code vectors, 
the decoder is able to reliably detect frames in which co is likely to contain uncorrectable 
bit errors. The decoder performs frame repeats during these frames in order to reduce 
the perceived degradation in the presence of bit errors. This is explained more fully in 
Sections 7.6 and 7.7. 

7.5 Bit Interleaving 

Intra-frame bit interleaving is used to spread short bursts of errors among several code 
words. The division of each frame of 144 bits into 72 dibit symbols is tabulated in An¬ 
nex Annex H. The minimum separation between any two bits of the same error correction 
code is 3 symbols. This appendix uses the notation Cj(n) to designate the n'th bit of the 
modulated code vector Cj (or the demodulated code vector Cj). Note that bit N — 1 (where 
N is the vector length) is the MSB of each vector and bit 0 is the LSB. The speech coder bits 
should be inserted into the Project 25 frame format beginning with symbol 0 and ending 
with symbol 1. This is described more completely in the Project 25 Common Air Interface 
specification. 

7.6 Error Estimation 

The IMBE speech decoder estimates the number of errors in each received data frame by 
computing the number of errors corrected by each of the [23,12] and [15,11] Hamming codes. 
The number of errors for each code vector is denoted e,; for 0 < * < 6, where e, refers to 
the number of bit errors which were corrected during the error decoding of u r . From these 
error values two other error parameters are computed as shown below. 

6 

e T = 51( 95 ) 

i =0 

e R { 0) = .95 * e R (—1) + .000365e T (96) 

The parameter cr( 0) is the estimate of the error rate for the current frame, while ejj( — 1) 
is the estimate of the error rate for the previous frame. These error parameters are used to 
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control the frame repeat process described below, and to control the parametric smoothing- 
functions described in Section 9. 

7.7 Frame Repeats 

The IMBE decoder examines each received data frame in order to detect and discard frames 
which are highly corrupted. A number of different fault conditions are checked and if any of 
these conditions indicate the current frame is invalid, then a frame repeat is performed. The 
IMBE speech encoder uses values of bo in the range 0 < 6o < 207 to represent valid pitch 
estimates. The remaining values of bo are reserved for future expansion and are currently 
considered invalid. A frame repeat is performed by the decoder if it receives an invalid value 
of bo, or if both of the following two equations are true. 

e 0 > 2 (97) 

ej 1 > 10 + 40 e/j (98) 

These two equations are used to detect the incorrect bit demodulation which results if there 
are uncorrectable bit errors in Co- The decoder performs a frame repeat by taking the 
following steps: 

1) The current 144 bit received data frame is marked as invalid and subsequently ignored 
during future processing steps. 

2) The IMBE model parameters for the current frame are set equal to the IMBE model 
parameters for the previous frame. Specifically, the following update expressions are com¬ 
puted. 


w 0 ( 0 ) 

= cho(-l) 

(99) 

m 

= L(- 1) 

( 100 ) 

K( 0 ) 

= K(~ 1 ) 

( 101 ) 

5ft ( 0 ) 

= »*.(—1) f°r 1 < k < K 

( 102 ) 

Mm 

= Afj(-l) for 1 < l < L 

(103) 

Mm 

= Mi(-l) for 1 < l < L 

(104) 
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3) The repeated model parameters are used in all future processing wherever the current 
model parameters are required. This includes the synthesis of the current segment of speech 
as is described in Section 11. 

7.8 Frame Muting 

The IMBE decoder is required to mute in severe bit error environments for which e# > .0875. 
This capability causes the decoder to squelch its output if reliable communication cannot 
be supported. 

The recommended muting method is to first compute the update equations as listed in 
step (2) of the frame repeat process (see Section 7.7. The decoder should then bypass the 
speech synthesis algorithm described in Section 11 and, instead, set the synthetic speech 
signal, s(n) to random noise which is uniformly distributed over the interval [-5, 5]. This 
technique provides for a small amount of “comfort noise” as is typically done in telecom¬ 
munication systems. 


8 Spectral Amplitude Enhancement 


The IMBE speech decoder attempts to improve the perceived quality of the synthesized 
speech by enhancing the spectral amplitudes. The unenhanced spectral amplitudes are 
required by future frames in the computation of Equation (77). However, the enhanced 
spectral amplitudes are used in speech synthesis. The spectral amplitude enhancement is 
accomplished by generating a set of spectral weights from the model parameters of the 
current frame. First Rmo and Run\ are calculated as shown below 

L 

Rmo = E^ 2 ( 105 ) 

i=i 


L 

Rmi = E cos (wo l) 

1=1 


(106) 


Next, the parameters Rmo-, and Rmi are used to calculate a set of weights, Wj. given by 

1 

Mn(R 2 M0 + R 2 M1 - 2,R M o Rmi cos(w 0 l)) 


Wi = 


y/tir 


&o Rm o ( R\io -^mi ) 


for 1 <1<L (107) 
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Mi={ 


for 1 < l < L 


(108) 


These weights are then used to enhance the spectral amplitudes for the current frame 
according to the relationship: 

Mi if 8 1 < L 

1.2 • Mi else if W/ >1.2 

.5 • Mi else if W/ < .5 

Wi ■ Mi otherwise 

A final step is to scale the enhanced spectral amplitudes in order to remove any energy 
difference between the enhanced and unenhanced amplitudes. The correct scale factor, 
denoted by 7 , is given below. 

Rmo 


7 = - JViU - (109) 

This scale factor is applied to each to each of the enhanced spectral amplitudes as shown 
in Equation (110). 


Mi = 7 • Mi 


for 1 < l < L 


( 110 ) 


For notational simplicity this equation refers to both the scaled and unsealed spectral ampli¬ 
tudes as Mi . This convention has been adopted since the unsealed amplitudes are discarded 
and only the scaled amplitudes are subsequently used by the decoder during parameter 
smoothing and speech synthesis. 

The value of Rmo expressed in Equation (105) is a measure of the energy in the current 
frame. This value is used to update a local energy parameter in accordance with the 
following rule. 

) .95Sb(-1) + .05 Rmo if .95 5 b (- 1) + .05 l ? A/0 > 10000.0 
S E (0) = s (HI) 

I 10000.0 otherwise 

This equation generates the local energy parameter for the current frame, Se( 0), from 
Rmo and the value of the local energy parameter from the previous frame Se{— 1)- The 
parameter S'e(O) is used in the following section. 


9 Adaptive Smoothing 

As part of the error control process described in Section 7.6, the decoder estimates two 
error rate parameters, <[■ and e r . which measure the total number of errors and the local 
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error rate for the current frame, respectively. These parameters are used by the decoder to 
adaptively smooth the decoded model parameters. The result is improved performance in 
high bit error environments. 

The first parameters to be smoothed by the decoder are the V/UV decisions. First an 
adaptive threshold Vm is calculated using equation ( 112 ), 

if e/j(0) < .005 and er < 4 
else if e/?(0) < .0125 and £4 = 0 (112) 

otherwise 


V M = { 


00 


45.255 (S e (0)Y 375 
exp(277.26 ej? (0)) 

1.414 (S £ (0 ))- 375 


where the energy parameter 5^(0) is defined in Equation (111) in Section 8 . After the adap¬ 
tive threshold is computed each enhanced spectral amplitude M/ for 1 < l < L is compared 
against Vm- and if Mi > Vm then the V/UV decision for that spectral amplitude is declared 
voiced, regardless of the decoded V/UV decision. Otherwise the decoded V/UV decision for 
that spectral amplitude is left unchanged. This process can be expressed mathematically 
as shown below. 


1 if M t > V M 

vi = < 


for 1 < l < L 


(113) 


^ vj otherwise 

Once the V/UV decisions have been smoothed, the decoder adaptively smooths the spec¬ 
tral amplitudes Mi for 1 < l < L. The spectral amplitude smoothing algorithm computes 
the following amplitude measure for the current segment. 

L 


Am = y^ M i 

1=1 


(114) 


Next an amplitude threshold is updated according to the following equation, 


tm(0) = < 


20480 

6000 — 300£t + tm(— 1 ) 


if e/?(0) < .005 and er(0) < 6 
otherwise 


(115) 


where r m ( 0 ) and tm{~ 1 ) represent the value of the amplitude threshold for the current and 
previous frames respectively. The two parameters Am and tm{ 0) are then used to compute 
a scale factor 7 m given below. 


7 M = 


1.0 


tm(0) 

a M 


if t m ( 0 ) > A m 

otherwise 


(116) 
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M, 

1 < 1 <L 


to speech 
synthesis 


V]_ 

1 < 1 <L 


£t £r 


Fig. 25: Parameter Enhancement and Smoothing 


This scale factor is multiplied by each of the spectral amplitudes Mi for 1 < l < L. Note 
that this step must be completed after spectral amplitude enhancement has been performed 
using the methods of Section 8 and after Vm has been computed according to Equation 112. 
The correct sequence is shown in Figure 25 
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i 

Ji 

Cj I ... C- j 
i.Ji 

1 

2 

Ti,T 2 

2 

2 

n, n 

3 

3 

n,n,f 7 

4 

3 

T 8 ,T 9 ,Tw 

5 

3 

'!'] 1 .7 12 • T 13 

6 

3 

7'm. Ti 6 


Table 5: Division of Prediction Residuals into Blocks in Encoding Example 

10 Parameter Encoding Example 

This section provides an example of the quantization and bit manipulation for a typical 
parameter frame. In this example the fundamental frequency is assumed to be equal to 
ujo = 35 ^x 25 ~ Since the values of L and K are related to ujq through equations (37) and (38), 
they are equal to L = 16 and K = 6. The remaining model parameters are left unspecified 
since they do not affect the numbers presented in this example. 

The encoding of this example parameter frame proceeds as follows. First the fun¬ 
damental frequency is encoded into the 8 bit value bo using equation (45), and the 6 
voiced/unvoiced decisions are encoded into the 6 bit value b\ using equation (49). The 
16 spectral amplitude prediction residuals, 7] for 1 < / < 16, are then formed using equa¬ 
tions (52) through (55). Next, these prediction residuals are divided into six blocks where 
the lengths of each block, J; for 1 < i < 6 , are shown in Table 5. The spectral amplitude 
prediction residuals are then divided into the six vectors for 1 < i < 6 and 1 < j < Jj. 
The first J\ elements of T) form c,\. r The next J 2 elements of T) for C 2 j, and so on. This 
is shown in Table 5. Each block for 1 < i < 6 , is transformed with a J,; point DCT 

using equation (60) to produce the set DCT coefficients C-^k for 1 < k < J,. The first DCT 
coefficient from each of the six blocks is used to form the gain vector Rj. The gain vector 
is then transformed into the vector G m using the six point DCT shown in Equation(61). 
The first element of the transformed gain vector, denoted by G\ . is then quantized using 
the non-uniform quantizer tabulated in Annex Annex E. The 6 bit value 62 is set equal 
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G m 

B m 

A m 

2 

62 

6 


3 

g 3 

6 


4 

g 4 

6 


5 

g 5 

5 


6 

Gg 

5 

.03696 


Table 6 : Example Bit Allocation and Step Size for the Transformed Gain Vector 

to the index of the quantizer element which is closest to G \. The remaining five elements 
of the transformed gain vector are quantized using the uniform quantizers generated from 
Annex Annex F with L = 16. The correct step sizes and bit allocation for G> through Gg 
is shown in Table 6 . Once the bit allocation and step sizes have been computed, the bit 
encodings 63 through 67 are generated using Equation (62). 

After the gain vector has been quantized and encoded, the remaining bits are distributed 
among the ten higher order DCT coefficients, C r ± for 1 < i < 6 and 2 < k < This 
is done using Annex Annex G and the resulting bit allocation is shown in Table 7. Each 
DCT coefficient is then quantized using equation (63). The step sizes for these quantizers 
are computed using Tables 3 and 4, and the results are shown in Table 7. 

Finally, the one bit synchronization value, denoted by bis, is generated as a sequence 
which alternates between 0 and 1 on each successive 20ms. voice frame. The 19 bit encod¬ 
ings, bo through bis, are then rearranged into the eight bit vectors u 0 through 117. This is 
accomplished using the procedure described in Section 7, and the result is shown in Tables 8 
through 10. The convention in these tables is that the appropriate bit from the vector listed 
in the first two columns is set equal to the appropriate bit from the bit encoding listed in 
the last two columns, where the least significant bit corresponds to bit 1 . 

The four bit vectors uo through 113 are each encoded with a [23,12] Golay code into 
the code vectors Do through D 3 , respectively. Similarly, the three bit vectors 114 through 
uq are each encoded with a [15,11] Hamming code into the code vectors D2 through vg, 
respectively. The vector D 7 is set equal to 117. These code vectors are then modulated 
using Equations (84) through (94) to produce the modulated code vectors cq through C7. 
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m 



A m 

8 

Ci,2 

6 

.04605 

9 

C 2,2 

6 

.04605 

10 

C3,2 

5 

.08596 

11 

Cji,3 

4 

.09640 

12 

Cl,2 

4 

.12280 

13 

Cl,3 

3 

.15665 

14 

Cs ,2 

3 

.19955 

15 

Cs,3 

3 

.15665 

16 

Cf>,2 

3 

.19955 

17 

C6,3 

2 

.20485 


Table 7: Example Bit Allocation and Step Size for Higher Order DCT Coefficients 


The eight modulated code vectors are then interleaved as specified in Annex Annex H, and 
finally the frame bits are embedded into the Project 25 frame format in ascending order 
(i.e. bit 1 1 first and bit f 144 last). 
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Vector 

Bit Number 

Vector 

Bit Number 

u 0 

12 

bo 

8 

u 0 

11 

bo 

7 

u 0 

10 

bo 

6 

uo 

9 

bo 

5 

u 0 

8 

bo 

4 

u 0 

7 

bo 

3 

uo 

6 

h 

6 

Uo 

5 

b 2 

5 

uo 

4 

b 2 

4 

Uo 

3 

h 

6 

uo 

2 

k 

6 

Uo 

1 

h 

6 

U1 

12 

b 8 

6 

ill 

11 

bo 

6 

Ul 

10 

bo 

5 

U\ 

9 

h 

5 

iii 

8 

h 

5 

ill 

7 

bo 

5 

iii 

6 

h 

5 

iii 

5 

bs 

5 

iii 

4 

b9 

5 

iii 

3 

bio 

5 

iii 

2 

h 

4 

iii 

1 

h 

4 

ii‘2 

12 

b<s 

4 

ii2 

11 

bo 

4 

ii2 

10 

bj 

4 

ii2 

9 

h 

4 

ii2 

8 

l>9 

4 

ii2 

7 

bio 

4 

ii2 

6 

bn 

4 

ii2 

5 

bi2 

4 

U2 

4 

h 

3 


Table 8: Construction of u t in Encoding Example (1 of 3) 


54 












Vector 

Bit Number 

Vector 

Bit Number 

U2 

3 

h 

3 

ii2 

2 

h 

3 

U2 

1 

i>6 

3 

U3 

12 

b 7 

3 

U 3 

11 

h 

3 

u 3 

10 

l>9 

3 

u 3 

9 

bio 

3 

u 3 

8 

b n 

3 

u 3 

7 

bi2 

3 

u 3 

6 

bi 3 

3 

u 3 

5 

bi4 

3 

u 3 

4 

bis 

3 

u 3 

3 

bia 

3 

U 3 

2 

b 3 

2 

u 3 

1 

h 

2 

U 4 

11 

bi 

6 

U4 

10 

bi 

5 

ii 3 

9 

h 

4 

U4 

8 

bi 

3 

ii4 

7 

bi 

2 

U4 

6 

bi 

1 

in 

5 

h 

3 

U4 

4 

b 2 

2 

ii4 

3 

b 5 

2 

U4 

2 

be 

2 

ii4 

1 

h 

2 

u 3 

11 

b 3 

2 

u 5 

10 

bo 

2 

us 

9 

bio 

2 

u 5 

8 

bn 

2 

u 3 

7 

b 12 

2 

u 5 

6 

b 13 

2 


Table 9: Construction of Ui in Encoding Example (2 of 3) 
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Vector 

Bit Number 

Vector 

Bit Number 

% 

5 

b\A 

2 

u 5 

4 

b\s 

2 

II'B 

3 

bis 

2 

u 5 

2 

b\ 7 

2 

Us 

1 

h 

1 

ue 

11 

h 

1 

UQ 

10 

h 

1 

us 

9 

bs 

1 

us 

8 

b 7 

1 

Us 

7 

b 8 

1 

us 

6 

b9 

1 

Us 

5 

bio 

1 

us 

4 

bn 

1 

Us 

3 

bi2 

1 

us 

2 

bi3 

1 

Us 

1 

bu 

1 

u 7 

7 

bis 

1 

u 7 

6 

bis 

2 

u 7 

5 

bi 7 

1 

u 7 

4 

h 

1 

u 7 

3 

bo 

2 

u 7 

2 

bo 

1 

U 7 

1 

b is 

0 


Table 10: Construction of ii, in Encoding Example (3 of 3) 
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11 Speech Synthesis 

As was discussed in Section 5, the IMBE speech coder estimates a set of model parameters 
for each speech frame. These parameters consist of the fundamental frequency u>o, the V/UV 
decisions for each frequency band Vk, and the spectral amplitudes M/. After the transmitted 
bits are received and decoded, a reconstructed set of model parameters is available for 
synthesizing speech. These reconstructed model parameters (after parameter enhancement 
and smoothing) are denoted coo, vi and Mi, and they correspond to the reconstructed 
fundamental frequency, V/UV decisions and spectral amplitudes, respectively. In addition 
the parameter L, defined as the number of spectral amplitudes in the current frame, is 
generated from wo according to Equation (47). Because of a number of factors (such as 
quantization and channel errors) the reconstructed model parameters are not the same as 
the estimated model parameters Cj q, i>k and Mi- 


11.1 Speech Synthesis Notation 

The IMBE speech synthesis algorithm uses the reconstructed model parameters to generate 
a speech signal which is perceptually similar to the original speech signal. For each new set 
of model parameters, the synthesis algorithm generates a 20 ms frame of speech, s(n), which 
is interpolated between the previous set of model parameters and the newest or current set 
of model parameters. The notation L( 0), c2>o(0), D/(0) and M;(0) is used to denote the 
current set of reconstructed model parameters, while the notation L(— 1), wo(—1), «/(—1) 
and Mi (—1) is used to denote the previous set of reconstructed model parameters. For each 
new set of model parameters, s(n) is generated in the range 0 < n < N, where N equals 
160 samples (20 ms.). This synthetic speech signal is the output of the IMBE voice coder 
and is suitable for digital to analog conversion with a sixteen bit converter. 

The synthetic speech signal is divided into a voiced component s v (n) and an unvoiced 
component s. uv (n). These two components are synthesized separately, as shown in Figure 26, 
and then summed to form s(n). The unvoiced speech synthesis algorithm is discussed in 
Section 11.2 and the voiced speech synthesis algorithm is discussed in Section 11.3. 
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MBE Model 
Parameters 




Vk 

l<k<K 

Mi _ 

1<1<L 


Unvoiced 
Speech Synthesis 


Voiced 

Speech Synthesis 


Suv(n) 


s v (n) 


- s(n) 

Synthetic 

Speech 


Fig. 26: IMBE Speech Synthesis 


11.2 Unvoiced Speech Synthesis 

The energy from unvoiced spectral amplitudes is synthesized with an unvoiced speech syn¬ 
thesis algorithm. First a white noise sequence, u(n ), is generated. This noise sequence can 
have an arbitrary mean. A recommended noise sequence [10] can be generated as shown 
below. 

u(n + 1) = 171 u(n) + 11213 — 53125[- ^ ^ -J (Hi 7 ) 

53125 

The noise sequence is initialized to u(—105) = 3147. 

For each successive synthesis frame, u(n) is shifted by 20 ms. (160 samples) and win¬ 
dowed with ws{n), which is given in Annex Annex I. Since ws{n) has a non-zero length 
of 209 samples, there is a 49 sample overlap between the noise signal used in successive 
synthesis frames. Once the noise sequence has been shifted and windowed, the 256 point 
Discrete Fourier Transform U w (m) is computed according to: 

104 

U w (m ) = ^ u{n)ws{n)e J 256 for —128 <m< 127 (118) 

n=—104 

The function U w (m) is generated in a manner which is analogous to S w (m) defined in 
Equation (28) except that u(n) and ws(n) are used in place of s(n) and wr(u). 

The function U w (m ) is then modified to create U w (m). For each l in the range 1 < 
l < L{ 0), U w (m) is computed according to Equation (119) if the l’th spectral amplitude is 
voiced, and according to Equation (120) if the l’th spectral amplitude is unvoiced. 

U w (m) = 0 for \cii] < \m\ < \bi] (119) 
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U m (m) = 


'y w Mi(0)U w {m ) 

1 _ 

2 


(foi-fai) 


for \af\ < \m\ < |”6/1 


( 120 ) 


The unvoiced scaling coefficient j w is a function of the synthesis window ws{n) and the 
pitch refinement window wr{ti). It is computed according to the formula: 


lw = 


no 

n =-110 


n 


En=-l0-l w s( n ) 
E^i-no^M 


( 121 ) 


The frequency bands edges ai and b\ are computed from uj o according to equations (122) 
and (123), respectively. 

9 r ,f; 

( 122 ) 


- 256 n ^ - 

ai = —{l ~ -5) • u o 

Z7T 

7 256 „ . 

k = — (I + -5) • u 0 

Z7T 


(123) 


Finally, the very low frequency and very high frequency components of U w (m) are set 
equal to zero as shown in the following equation. 


U w (m) = < 


for \m\ < |"ai] 
for \bi~\ < \m\ < 128 


(124) 


The sequence u w (n), defined as the 256 point Inverse Discrete Fourier Transform of 
U w (m ), is the unvoiced speech for the current frame. The sequence u w (n ) is computed as 
shown in the following equation. 

-t 127 


u w {n) = 


256 


£ E4 


. . • ‘ 1 'Kmn 

[m)e J 256 


for -128 < n < 127 


(125) 


m=—128 

In order to generate s uv (n ), u w (n) must be combined with the unvoiced speech from the 
previous frame. This is accomplished using the Weighted Overlap Add algorithm described 
in [4], If u w (n , 0) is used to denote the unvoiced speech for the current frame, and u w (n , —1) 
is used to denote the unvoiced speech for the previous frame, then s uv (n) is given by 

w s (n)u w {n , -1) + w s {n - N)u w {n - N, 0) 


(n = 


for 0 < n < N 


(126) 


w 2 s (n) + w 2 s (n — N) 

In this equation ws{n) is assumed to be zero outside the range —105 < n < 105, and 
u-win, 0) and u w (n , —1) are assumed to be zero outside the range —128 < n < 127. 
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11.3 Voiced Speech Synthesis 

The voiced speech is synthesized by summing a voiced signal for each spectral amplitude 
according to the following equation. 

max[L(-i),L(o)] 

s v (n) = ^2 2 ■ s v j(n) for 0 < n < N (127) 

1=1 

The reader is referred to references [1, 3] for background information on the algorithm 
described in this section. The voiced synthesis algorithm attempts to match the f th spectral 
amplitude of the current frame with the Z’th spectral amplitude of the previous frame. The 
algorithm assumes that all spectral amplitudes outside the allowed range are equal to zero 
as shown in Equations (128) and (129). 


m o) = o 

for l > 1/(0) 

(128) 

Mi(- 1) =0 

for l > L{- 1) 

(129) 


In addition it assumes that these spectral amplitudes are unvoiced. These assumptions are 
needed for the case where the number of spectral amplitudes in the current frame is not 
equal to the number of spectral amplitudes in the previous frame (i.e. L( 0) / L(— 1)). 

The signal s v j(n) is computed differently for each spectral amplitude. If the /'tli spectral 
amplitude is unvoiced for both the previous and current speech frame then s v j{n) is set 
equal to zero as shown in the following equation. In this case the energy in this region of 
the spectrum is completely synthesized by the unvoiced synthesis algorithm described in 
the previous section. 

s v j(n) = 0 for 0 < n < N (130) 

Alternatively, if the Tth spectral amplitude is unvoiced for the current frame and voiced 
for the previous frame, then S v ,i{n) is given by the following equation. In this case the 
energy in this region of the spectrum transitions from the voiced synthesis algorithm to the 
unvoiced synthesis algorithm. 

= w s{n) Mi{ — 1) cos[a;o( — 1) nl + 4>l{~ 1)] for 0 < n < N (131) 

Similarly, if the Tth spectral amplitude is voiced for the current frame and unvoiced for 
the previous frame then S v j (n) is given by the following equation. In this case the energy in 
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this region of the spectrum transitions from the unvoiced synthesis algorithm to the voiced 
synthesis algorithm. 

Spj(n) = ws{n — N) M;(0) cos[u>o(0)(?r — N) l + <^(0)] for 0 < n < N (132) 

Otherwise, if the Z’th spectral amplitude is voiced for both the current and the previous 
frame, and if either l >= 8 or |o>o(0) — <2>o( — 1)| > -1 <io(0), then s v j(n) is given by the 
following equation. In this case the energy in this region of the spectrum is completely 
synthesized by the voiced synthesis algorithm. 

s v ,l{n) = w s (n)Mi(-l) cos[cD 0 (-l) n/ + 

+ ws(n —N) Mi(0) cos[u>o(0)(n — N)l + </>i(0)] (133) 

The variable n is restricted to the range 0 < n < N. The synthesis window ws {n) used 
in Equations (131), (132) and (133) is assumed to be equal to zero outside the range 
-105 < n < 105. 

A final rule is used if the Tth spectral amplitude is voiced for both the current and the 
previous frame, and if both l < 8 and |a>o(0) — lho(— 1)| < .1 Tq( 0). In this case s v j(n) is 
given by the following equation, and the energy in this region of the spectrum is completely 
synthesized by the voiced synthesis algorithm. 


s v j(n) = a t {n) cos[6>|(n)] 


for 0 < n < N 


The amplitude function cii(n) is given by, 


n r 


ai(n) = Af,(-1) + -[Mi( 0) - M,(-l)] 


and the phase function 6t(n) is given by Equations (136) through (138). 


&l{n) = 1) + [w 0 (-l) • l + Aw ; (0)]n + [w 0 (0) - a> 0 (—1)] 


In 2 
2 N 


IN 

A(f>i{0) = <j>i{ 0) - 4>i{- 1) - [w 0 (-l) + d»o(0)] • — 


Acu|(0) = 


N 


AA(0)-2xL^!M±Ij 


(134) 

(135) 

(136) 

(137) 

(138) 
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The phase parameter cf>i which is used in the above equations must be updated for each 
frame using Equations (139) through (141). The notation <f>i( 0) and ipi{0) refers to the 
parameter values in the current frame, while — 1) and ipi( — 1) denotes their counterparts 
in the previous frame. 

IN 

MO) = M~ l ) + [£o(—1) + ^o(0)] • — for 1 < l < 56 (139) 


MO) 


Mo) 


Mo) + 


Luv (Q) 
L(0) 


for 1 < l < LiJ 

for J < l < max[L( — 1), L(0)] 


(140) 


The parameter L uv { 0) is equal to the number of unvoiced spectral amplitudes in the current 
frame, and the parameter pi(0) used in equation (140) is defined to be a random number 
which is uniformly distributed in the interval [—7r, n). This random number can be generated 
using the following equation, 

« (0) = 5i§5“ (! »-* (141) 

where u(l) refers to shifted noise sequence for the current frame, which is described in 
Section 11.2. 

Note that ^(0) must be updated every frame using Equation (139) for 1 < l < 56, 
regardless of the value of L or the value of the V/UV decisions. 

Once s v j(n) is generated for each spectral amplitude the complete voiced component is 
generated according to equation (127). The synthetic speech signal is then computed by 
summing the voiced and unvoiced components as shown in equation (142). 


s(n) = s. uv (n) + s v (n ) for 0 < n < N. 


(142) 


This completes the IMBE speech synthesis algorithm. 
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Algorithm 

Delay (ms.) 

Analysis 

73.75 

Quantization 

0.0 

FEC/Interleaving 

0.0 

Reconstruction 

0.0 

Synthesis 

6.25 


Table 11: Breakdown of Algorithmic Delay 

12 Additional Notes 

The total algorithmic delay is 80 ms. This does not include any processing delay or trans¬ 
mission delay. The break down of the delay is shown in Table 11. The analysis delay is due 
to the filtering, windowing and two frame look-ahead used in the initial pitch estimation al¬ 
gorithm. The synthesis delay is introduced by the manner in which the synthesis algorihtm 
smoothly transitions between the parameters estimated for consecutive speech frames. 

In a few of the figures and flow charts, the variable x is equivalent to the variable x. 
For example the variable v in Figure 15 refers to the variable v in the text. This notational 
discrepancy is a consequence of the graphical software used to produce this document. 
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Annex A Variable Initialization 


Variable 

Initial Value 

P- 1 

100 

P-2 

100 

E-i(P) 

0 for all P 

E- 2 (P) 

0 for all P 

£ max 

100000 


•02985tt 


1 for all l 

Mi(-l) 

0 for all l 

H- 1 ) 

30 

K(~ 1 ) 

10 

Vk{~ 1 ) 

0 for all k 

Vk{- 1 ) 

0 for all k 


0.0 

Se 

75000 

u(n) 

t»(-105) = 3147 

u w (n, - 1 ) 

0 for all n 

^(-1) 

0 for all l 

M~ i) 

0 for all l 
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Annex B Initial Pitch Estimation Window 


n 

wi(n) 

n 

wi(n) 

n 

wj{n) 

n 

wi(n) 

-150 

0.00270174 

-110 

0.02113325 

-70 

0.05359430 

-30 

0.08371302 

-149 

0.00295485 

-109 

0.02181198 

-69 

0.05446897 

-29 

0.08424167 

-148 

0.00321783 

-108 

0.02250030 

-68 

0.05534209 

-28 

0.08475497 

-147 

0.00349080 

-107 

0.02319806 

-67 

0.05621329 

-27 

0.08525267 

-146 

0.00377385 

-106 

0.02390509 

-66 

0.05708219 

-26 

0.08573447 

-145 

0.00406710 

-105 

0.02462122 

-65 

0.05794842 

-25 

0.08620022 

-144 

0.00437064 

-104 

0.02534628 

-64 

0.05881159 

-24 

0.08664963 

-143 

0.00468457 

-103 

0.02608007 

-63 

0.05967136 

-23 

0.08708246 

-142 

0.00500898 

-102 

0.02682242 

-62 

0.06052732 

-22 

0.08749853 

-141 

0.00534396 

-101 

0.02757312 

-61 

0.06137912 

-21 

0.08789764 

-140 

0.00568957 

-100 

0.02833197 

-60 

0.06222639 

-20 

0.08827950 

-139 

0.00604590 

-99 

0.02909875 

-59 

0.06306869 

-19 

0.08864402 

-138 

0.00641300 

-98 

0.02987323 

-58 

0.06390573 

-18 

0.08899096 

-137 

0.00679095 

-97 

0.03065519 

-57 

0.06473708 

-17 

0.08932017 

-136 

0.00717979 

-96 

0.03144442 

-56 

0.06556234 

-16 

0.08963151 

-135 

0.00757957 

-95 

0.03224064 

-55 

0.06638119 

-15 

0.08992471 

-134 

0.00799034 

-94 

0.03304362 

-54 

0.06719328 

-14 

0.09019975 

-133 

0.00841213 

-93 

0.03385311 

-53 

0.06799815 

-13 

0.09045644 

-132 

0.00884496 

-92 

0.03466884 

-52 

0.06879549 

-12 

0.09069464 

-131 

0.00928887 

-91 

0.03549054 

-51 

0.06958490 

-11 

0.09091420 

-130 

0.00974387 

-90 

0.03631795 

-50 

0.07036604 

-10 

0.09111510 

-129 

0.01020996 

-89 

0.03715080 

-49 

0.07113854 

-9 

0.09129713 

-128 

0.01068715 

-88 

0.03798876 

-48 

0.07190202 

-8 

0.09146029 

-127 

0.01117544 

-87 

0.03883159 

-47 

0.07265611 

-7 

0.09160442 

-126 

0.01167480 

-86 

0.03967896 

-46 

0.07340052 

-6 

0.09172948 

-125 

0.01218523 

-85 

0.04053057 

-45 

0.07413481 

-5 

0.09183547 

-124 

0.01270669 

-84 

0.04138612 

-44 

0.07485873 

-4 

0.09192220 

-123 

0.01323915 

-83 

0.04224531 

-43 

0.07557184 

-3 

0.09198972 

-122 

0.01378257 

-82 

0.04310780 

-42 

0.07627381 

-2 

0.09203795 

-121 

0.01433691 

-81 

0.04397328 

-41 

0.07696434 

-1 

0.09206691 

-120 

0.01490209 

-80 

0.04484143 

-40 

0.07764313 

0 

0.09207659 

-119 

0.01547807 

-79 

0.04571191 

-39 

0.07830978 

1 

0.09206691 

-118 

0.01606477 

-78 

0.04658438 

-38 

0.07896401 

2 

0.09203795 

-117 

0.01666212 

-77 

0.04745849 

-37 

0.07960547 

3 

0.09198972 

-116 

0.01727001 

-76 

0.04833391 

-36 

0.08023385 

4 

0.09192220 

-115 

0.01788837 

-75 

0.04921030 

-35 

0.08084887 

5 

0.09183547 

-114 

0.01851709 

-74 

0.05008731 

-34 

0.08145022 

6 

0.09172948 

-113 

0.01915605 

-73 

0.05096456 

-33 

0.08203757 

7 

0.09160442 

-112 

0.01980515 

-72 

0.05184171 

-32 

0.08261070 

8 

0.09146029 

-111 

0.02046427 

-71 

0.05271840 

-31 

0.08316927 

9 

0.09129713 
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Annex C Pitch Refinement Window 


n 

w R (n) 

n 

wji(n) 

n 

w R (n) 

n 

w R (n) 

n 

w R (n ) 

-no 

0.014873 

-78 

0.205355 

-46 

0.607067 

-14 

0.956477 

18 

0.928916 

-109 

0.017397 

-77 

0.215294 

-45 

0.620807 

-13 

0.962377 

19 

0.921074 

-108 

0.020102 

-76 

0.225466 

-44 

0.634490 

-12 

0.967866 

20 

0.912868 

-107 

0.022995 

-75 

0.235869 

-43 

0.648105 

-11 

0.972940 

21 

0.904307 

-106 

0.026081 

-74 

0.246497 

-42 

0.661638 

-10 

0.977592 

22 

0.895400 

-105 

0.029365 

-73 

0.257347 

-41 

0.675076 

-9 

0.981817 

23 

0.886157 

-104 

0.032852 

-72 

0.268413 

-40 

0.688406 

-8 

0.985610 

24 

0.876589 

-103 

0.036546 

-71 

0.279689 

-39 

0.701616 

-7 

0.988967 

25 

0.866705 

-102 

0.040451 

-70 

0.291171 

-38 

0.714692 

-6 

0.991884 

26 

0.856516 

-101 

0.044573 

-69 

0.302851 

-37 

0.727620 

-5 

0.994358 

27 

0.846033 

-100 

0.048915 

-68 

0.314724 

-36 

0.740390 

-4 

0.996386 

28 

0.835267 

-99 

0.053482 

-67 

0.326782 

-35 

0.752986 

-3 

0.997966 

29 

0.824231 

-98 

0.058277 

-66 

0.339018 

-34 

0.765397 

-2 

0.999095 

30 

0.812935 

-97 

0.063303 

-65 

0.351425 

-33 

0.777610 

-1 

0.999774 

31 

0.801391 

-96 

0.068563 

-64 

0.363994 

-32 

0.789612 

0 

1.000000 

32 

0.789612 

-95 

0.074062 

-63 

0.376718 

-31 

0.801391 

1 

0.999774 

33 

0.777610 

-94 

0.079801 

-62 

0.389588 

-30 

0.812935 

2 

0.999095 

34 

0.765397 

-93 

0.085782 

-61 

0.402594 

-29 

0.824231 

3 

0.997966 

35 

0.752986 

-92 

0.092009 

-60 

0.415727 

-28 

0.835267 

4 

0.996386 

36 

0.740390 

-91 

0.098483 

-59 

0.428978 

-27 

0.846033 

5 

0.994358 

37 

0.727620 

-90 

0.105205 

-58 

0.442337 

-26 

0.856516 

6 

0.991884 

38 

0.714692 

-89 

0.112176 

-57 

0.455793 

-25 

0.866705 

7 

0.988967 

39 

0.701616 

-88 

0.119398 

-56 

0.469336 

-24 

0.876589 

8 

0.985610 

40 

0.688406 

-87 

0.126872 

-55 

0.482955 

-23 

0.886157 

9 

0.981817 

41 

0.675076 

-86 

0.134596 

-54 

0.496640 

-22 

0.895400 

10 

0.977592 

42 

0.661638 

-85 

0.142572 

-53 

0.510379 

-21 

0.904307 

11 

0.972940 

43 

0.648105 

-84 

0.150799 

-52 

0.524160 

-20 

0.912868 

12 

0.967866 

44 

0.634490 

-83 

0.159276 

-51 

0.537971 

-19 

0.921074 

13 

0.962377 

45 

0.620807 

-82 

0.168001 

-50 

0.551802 

-18 

0.928916 

14 

0.956477 

46 

0.607067 

-81 

0.176974 

-49 

0.565639 

-17 

0.936386 

15 

0.950174 

47 

0.593284 

-80 

0.186192 

-48 

0.579470 

-16 

0.943474 

16 

0.943474 

48 

0.579470 

-79 

0.195653 

-47 

0.593284 

-15 

0.950174 

17 

0.936386 

49 

0.565639 
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n 

w R {n) 

n 

w R {n) 

50 

0.551802 

82 

0.168001 

51 

0.537971 

83 

0.159276 

52 

0.524160 

84 

0.150799 

53 

0.510379 

85 

0.142572 

54 

0.496640 

86 

0.134596 

55 

0.482955 

87 

0.126872 

56 

0.469336 

88 

0.119398 

57 

0.455793 

89 

0.112176 

58 

0.442337 

90 

0.105205 

59 

0.428978 

91 

0.098483 

60 

0.415727 

92 

0.092009 

61 

0.402594 

93 

0.085782 

62 

0.389588 

94 

0.079801 

63 

0.376718 

95 

0.074062 

64 

0.363994 

96 

0.068563 

65 

0.351425 

97 

0.063303 

66 

0.339018 

98 

0.058277 

67 

0.326782 

99 

0.053482 

68 

0.314724 

100 

0.048915 

69 

0.302851 

101 

0.044573 

70 

0.291171 

102 

0.040451 

71 

0.279689 

103 

0.036546 

72 

0.268413 

104 

0.032852 

73 

0.257347 

105 

0.029365 

74 

0.246497 

106 

0.026081 

75 

0.235869 

107 

0.022995 

76 

0.225466 

108 

0.020102 

77 

0.215294 

109 

0.017397 

78 

0.205355 

110 

0.014873 

79 

0.195653 



80 

0.186192 



81 

0.176974 
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Annex D FIR Low Pass Filter 


n 

hLPF{n) 

-10 

-.002898 

-9 

-.002831 

-8 

.005666 

-7 

.016601 

-6 

.008800 

-5 

-.026955 

-4 

-.055990 

-3 

-.015116 

-2 

.118754 

1 

.278990 

0 

.351338 

1 

.278990 

2 

.118754 

3 

-.015116 

4 

-.055990 

5 

-.026955 

6 

.008800 

7 

.016601 

8 

.005666 

9 

-.002831 

10 

-.002898 
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Annex E Gain Quantizer Levels 


h 

Quantizer Level 

b 2 

Quantizer Level 

0 

-2.842205 

32 

2.653909 

1 

-2.694235 

33 

2.780654 

2 

-2.558260 

34 

2.925355 

3 

-2.382850 

35 

3.076390 

4 

-2.221042 

36 

3.220825 

5 

-2.095574 

37 

3.402869 

6 

-1.980845 

38 

3.585096 

7 

-1.836058 

39 

3.784606 

8 

-1.645556 

40 

3.955521 

9 

-1.417658 

41 

4.155636 

10 

-1.261301 

42 

4.314009 

11 

-1.125631 

43 

4.444150 

12 

-0.958207 

44 

4.577542 

13 

-0.781591 

45 

4.735552 

14 

-0.555837 

46 

4.909493 

15 

-0.346976 

47 

5.085264 

16 

-0.147249 

48 

5.254767 

17 

0.027755 

49 

5.411894 

18 

0.211495 

50 

5.568094 

19 

0.388380 

51 

5.738523 

20 

0.552873 

52 

5.919215 

21 

0.737223 

53 

6.087701 

22 

0.932197 

54 

6.280685 

23 

1.139032 

55 

6.464201 

24 

1.320955 

56 

6.647736 

25 

1.483433 

57 

6.834672 

26 

1.648297 

58 

7.022583 

27 

1.801447 

59 

7.211777 

28 

1.942731 

60 

7.471016 

29 

2.118613 

61 

7.738948 

30 

2.321486 

62 

8.124863 

31 

2.504443 

63 

8.695827 
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Annex F Bit Allocation and Step Size for Transformed Gain 
Vector 


L 

G m ~ i 

kri 


A m 

L 

G m — 1 

b'm 


A m 

9 

g 2 

k 

10 

0.003100 

15 

g 2 

k 

7 

0.024800 

9 

g 3 

k 

9 

0.004020 

15 

g 3 

k 

6 

0.030150 

9 

g 4 

k 

9 

0.003360 

15 

g 4 

k 

6 

0.025200 

9 

g 5 

k 

9 

0.002900 

15 

Gs 

k 

6 

0.021750 

9 

Ge 

k 

9 

0.002640 

15 

Ge 

k 

5 

0.036960 

10 

g 2 

k 

9 

0.006200 

16 

g 2 

k 

6 

0.046500 

10 

g 3 

k 

9 

0.004020 

16 

g 3 

k 

6 

0.030150 

10 

g 4 

k 

8 

0.006720 

16 

g 4 

k 

6 

0.025200 

10 

Gs 

k 

8 

0.005800 

16 

Gs 

k 

5 

0.040600 

10 

Ge 

k 

8 

0.005280 

16 

Ge 

k 

5 

0.036960 

11 

g 2 

k 

8 

0.012400 

17 

g 2 

k 

6 

0.046500 

11 

g 3 

k 

8 

0.008040 

17 

g 3 

k 

6 

0.030150 

11 

g 4 

k 

8 

0.006720 

17 

g 4 

k 

5 

0.047040 

11 

Gs 

k 

7 

0.011600 

17 

Gs 

k 

5 

0.040600 

11 

Ge 

k 

7 

0.010560 

17 

Ge 

k 

5 

0.036960 

12 

g 2 

k 

8 

0.012400 

18 

g 2 

k 

6 

0.046500 

12 

g 3 

k 

7 

0.016080 

18 

g 3 

k 

5 

0.056280 

12 

g 4 

k 

7 

0.013440 

18 

g 4 

k 

5 

0.047040 

12 

Gs 

k 

7 

0.011600 

18 

Gs 

k 

5 

0.040600 

12 

g 6 

k 

7 

0.010560 

18 

Ge 

k 

5 

0.036960 

13 

g 2 

k 

7 

0.024800 

19 

g 2 

k 

6 

0.046500 

13 

g 3 

k 

7 

0.016080 

19 

g 3 

k 

5 

0.056280 

13 

g 4 

k 

7 

0.013440 

19 

g 4 

k 

5 

0.047040 

13 

Gs 

k 

6 

0.021750 

19 

Gs 

k 

4 

0.058000 

13 

Ge 

k 

6 

0.019800 

19 

Ge 

k 

4 

0.052800 

14 

g 2 

k 

7 

0.024800 

20 

g 2 

k 

6 

0.046500 

14 

g 3 

k 

6 

0.030150 

20 

g 3 

k 

5 

0.056280 

14 

g 4 

k 

6 

0.025200 

20 

g 4 

k 

5 

0.047040 

14 

Gs 

k 

6 

0.021750 

20 

Gs 

k 

4 

0.058000 

14 

g 6 

k 

6 

0.019800 

20 

Ge 

k 

4 

0.052800 
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L 

Gm— 1 

bm 


A} Ji 

L 

Gm— 1 

bm 

Bm 


21 

g 2 

b 3 

5 

0.086800 

27 

g 2 

b 3 

5 

0.086800 

21 

g 3 

b 4 

5 

0.056280 

27 

g 3 

b 4 

4 

0.080400 

21 

g 4 

be 

5 

0.047040 

27 

g 4 

be 

4 

0.067200 

21 

g 5 

b G 

4 

0.058000 

27 

Ge 

be 

3 

0.094250 

21 

G g 

i>7 

4 

0.052800 

27 

Ge 

h 

3 

0.085800 

22 

g 2 

b 3 

5 

0.086800 

28 

g 2 

b 3 

4 

0.124000 

22 

g 3 

b 4 

5 

0.056280 

28 

g 3 

k 

4 

0.080400 

22 

g 4 

h 

4 

0.067200 

28 

g 4 

be 

4 

0.067200 

22 

g 5 

be 

4 

0.058000 

28 

Ge 

be 

3 

0.094250 

22 

g 6 

b>7 

4 

0.052800 

28 

Ge 

h 

3 

0.085800 

23 

g 2 

b 3 

5 

0.086800 

29 

g 2 

b 3 

4 

0.124000 

23 

g 3 

b 4 

4 

0.080400 

29 

g 3 

b 4 

4 

0.080400 

23 

g 4 

h 

4 

0.067200 

29 

g 4 

be 

4 

0.067200 

23 

g 5 

be 

4 

0.058000 
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Annex G Bit Allocation for Higher Order DCT Coefficients 
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11 

1.000000 

42 

1.000000 

-81 

0.480000 

-50 

1.000000 

-19 

1.000000 

12 

1.000000 

43 

1.000000 

-80 

0.500000 

-49 

1.000000 

-18 

1.000000 

13 

1.000000 

44 

1.000000 

-79 

0.520000 

-48 

1.000000 

-17 

1.000000 

14 

1.000000 

45 

1.000000 

-78 

0.540000 

-47 

1.000000 

-16 

1.000000 

15 

1.000000 

46 

1.000000 

-77 

0.560000 

-46 

1.000000 

-15 

1.000000 

16 

1.000000 

47 

1.000000 

-76 

0.580000 

-45 

1.000000 

-14 

1.000000 

17 

1.000000 

48 

1.000000 

-75 

0.600000 

-44 

1.000000 

-13 

1.000000 

18 

1.000000 

49 

1.000000 
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n 

w s {n) 

n 

w s (n) 

50 

1.000000 

81 

0.480000 

51 

1.000000 

82 

0.460000 

52 

1.000000 

83 

0.440000 

53 

1.000000 

84 

0.420000 

54 

1.000000 

85 

0.400000 

55 

1.000000 

86 

0.380000 

56 

0.980000 

87 

0.360000 

57 

0.960000 

88 

0.340000 

58 

0.940000 

89 

0.320000 

59 

0.920000 

90 

0.300000 

60 

0.900000 

91 

0.280000 

61 

0.880000 

92 

0.260000 

62 

0.860000 

93 

0.240000 

63 

0.840000 

94 

0.220000 

64 

0.820000 

95 

0.200000 

65 

0.800000 

96 

0.180000 

66 

0.780000 

97 

0.160000 

67 

0.760000 

98 

0.140000 

68 

0.740000 

99 

0.120000 

69 

0.720000 

100 

0.100000 

70 

0.700000 

101 

0.080000 

71 

0.680000 

102 

0.060000 

72 

0.660000 

103 

0.040000 

73 

0.640000 

104 

0.020000 

74 

0.620000 

105 

0.000000 

75 

0.600000 



76 

0.580000 



77 

0.560000 



78 

0.540000 



79 

0.520000 



80 

0.500000 
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Annex J Log Magnitude Prediction Residual Block Lengths 


L 

Ji 

J 2 

J 3 

Ji 


Je 

L 

Ji 

J 2 

J3 

Ji 

Js 

Je 

9 

1 

1 

1 

2 

2 

2 

34 

5 

5 

6 

6 

6 

6 

10 

1 

1 

2 

2 

2 

2 

35 

5 

6 

6 

6 

6 

6 

11 

1 

2 

2 

2 

2 

2 

36 

6 

6 

6 

6 

6 

6 

12 

2 

2 

2 

2 

2 

2 

37 

6 

6 

6 

6 

6 

7 

13 

2 

2 

2 

2 

2 

3 

38 

6 

6 

6 

6 

7 

7 

14 

2 

2 

2 

2 

3 

3 

39 

6 

6 

6 

7 

7 

7 

15 

2 

2 

2 

3 

3 

3 

40 

6 

6 

7 

7 

7 

7 

16 

2 

2 

3 

3 

3 

3 

41 

6 

7 

7 

7 

7 

7 

17 

2 

3 

3 

3 

3 

3 

42 

7 

7 

7 

7 

7 

7 

18 

3 

3 

3 

3 

3 

3 

43 

7 

7 

7 

7 

7 

8 

19 

3 

3 

3 

3 

3 

4 

44 

7 

7 

7 

7 

8 

8 

20 

3 

3 

3 

3 

4 

4 

45 

7 

7 

7 

8 

8 

8 

21 

3 

3 

3 

4 

4 

4 

46 

7 

7 

8 

8 

8 

8 

22 

3 

3 

4 

4 

4 

4 

47 

7 

8 

8 

8 

8 

8 

23 

3 

4 

4 

4 

4 

4 

48 

8 

8 

8 

8 

8 

8 

24 

4 

4 

4 

4 

4 

4 

49 

8 

8 

8 

8 

8 

9 

25 

4 

4 

4 

4 

4 

5 

50 

8 

8 

8 

8 

9 

9 

26 

4 

4 

4 

4 

5 

5 

51 

8 

8 

8 

9 

9 

9 

27 

4 

4 

4 

5 

5 

5 

52 

8 

8 

9 

9 

9 

9 

28 

4 

4 

5 

5 

5 

5 

53 

8 

9 

9 

9 

9 

9 

29 

4 

5 

5 

5 

5 

5 

54 

9 

9 

9 

9 

9 

9 

30 

5 

5 

5 

5 

5 

5 

55 

9 

9 

9 

9 

9 

10 

31 

5 

5 

5 

5 

5 

6 

56 

9 

9 

9 

9 

10 

10 

32 

5 

5 

5 

5 

6 

6 








33 

5 

5 

5 

6 

6 

6 
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Annex K Flow Charts 


Speech Analysis 
Pitch Estimation 
VAJV Determination 
Spectral Amplitude Estimation 


Parameter Encoding 

- Fundamental Frequency Encoding 

- VAJV Decision Encoding 

- Spectral Amplitude Encoding _ 


Bit Manipulation 

- Bit Prioritization 

- FEC Encoding 

- Random Bit Modulation 

- Bit Interleaving _ 


Bit Manipulation - 
Bit De-Interleaving 
Random Bit Demodulation 
Error Correction Decoding 
Bit Rearrangment _ 


Parameter Decoding 
Fundamental Frequency Decoding 
VAJV Decision Decoding 
Spectral Amplitude Decoding _ 


Parameter Enhancment & Smoothing 

- Spectral Amplitude Enhancement 

- VAJV Smoothing 

- Spectral Amplitude Smoothing _ 


Speech Synthesis - 

■ Spectral Amplitude Enhancement 

■ Unvoiced Speech Synthesis 

■ Voiced Speech Synthesis 


N 


> Encoder 


Channel 

N 


> Decoder 


/ 


Flow Chart 1: IMBE Voice Coder 
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Flow Chart 2: Initial Pitch Estimation 
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Flow Chart 3: Look-Back Pitch Tracking 
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(a) (b) (c) (d) 

Flow Chart 4: Look-Ahead Pitch Tracking (1 of 3) 
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(e) 


Flow Chart 4: Look-Ahead Pitch Tracking (2 of 3) 
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(e) 



END 

Flow Chart 4: Look-Ahead Pitch Tracking (3 of 3) 
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(a) 

Flow Chart 5: V/UV Determination (1 of 2) 
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Flow Chart 5: V/UV Determination (2 of 2) 
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Compute u(n) 
via Equation (117) 


1 = 1 



< 


Compute U w (m) 
via Equation (118) 


r 


(a) 


(b) 


Flow Chart 6: Unvoiced Speech Synthesis (1 of 2) 
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(a) (b) 



Flow Chart 6: Unvoiced Speech Synthesis (2 of 2) 
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(a) 


(b) 


Flow Chart 7: Voiced Speech Synthesis (1 of 2) 
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(a) (b) (c) 



Flow Chart 7: Voiced Speech Synthesis (2 of 2) 
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Flow Chart 8: Spectral Amplitude Enhancement 
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(a) (b) (c) 


Flow Chart 9: Adaptive Smoothing (1 of 2) 
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(a) (b) (c) 



Flow Chart 9: Adaptive Smoothing (2 of 2) 
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(a) 


Flow Chart 10: Encoder Bit Manipulations (1 of 2) 
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(a) 



Flow Chart 10: Encoder Bit Manipulations (2 of 2) 
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(a) 


Flow Chart 11: Decoder Bit Manipulations (1 of 2) 
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(a) 



Flow Chart 11: Decoder Bit Manipulations (2 of 2) 
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(a) (b) (c) 


Flow Chart 12: Pitch Refinement (1 of 2) 
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Flow Chart 12: Pitch Refinement (2 of 2) 
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